broadinstitute / genetic-prevalence-estimator

https://genie.broadinstitute.org/
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Exclude filtered variants from gnomAD variant lists #15

Closed nawatts closed 3 years ago

nawatts commented 3 years ago

Related to #11. By default, exclude variants that fail filters in both exomes and genomes.

nawatts commented 3 years ago

Another option would be to include them in the variant list and only exclude them from the prevalence calculation by default. With custom variant lists, we'll already have to handle the case where a variant list may include filtered variants even if it is configured to exclude filtered variants from prevalence calculations.

sambaxter commented 3 years ago

I think if people provide variants with filtered variants we could return them an error message that said the following variants were not included in the calculations due to quality errors, and then maybe they could choose to override (with the appropriate warnings). Thoughts?

nawatts commented 3 years ago

That makes sense, and I think that will work for gnomAD lists as well. When we generate the gnomAD list, we'll include filtered variants (except AC = 0) in the list, but calculate prevalence without them by default. Then we show the message you've described and give them the option to include filtered variants in the calculation. We could also provide that option up front when they are entering the variant list vs after prevalence has been calculated.

On the data model side, this means that one variant list can have multiple prevalence calculation results associated with it.

nawatts commented 3 years ago

Only remove variants that did not pass QC filters in one of exome/genome samples. More info in #34.