broadinstitute / tgg_methods

Repo for miscellaneous methods developed by the methods group that don't fit anywhere else
MIT License
4 stars 0 forks source link

Script to find the number shared sites between pairs #52

Closed mike-w-wilson closed 1 year ago

mike-w-wilson commented 2 years ago

This script accepts an arg for the number of non-ref samples per site and filter a VDS to only those sites. It then calculates the number of those sites that are shared between pairs. For example, if the passed arg non_ref_samples is equal to 3, the script will filter the VDS to sites where hail's n_non_ref calculates to 3. It then grabs the samples found at that site, creates a list of sample pairs, and tallies the number of sites where n_non_ref=3 per pair. The script has an additional het_only filter if you just want to consider sites where no hom_var exists.

mike-w-wilson commented 2 years ago

I decided to move the write after going through your suggestions. I like it better where it is now but no strong feeling on it. Back to you @ch-kr !