exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
202 stars 55 forks source link

Use LOCAL db for initial frequency filtering but not for inheritance mode filter #361

Open oleraj opened 4 years ago

oleraj commented 4 years ago

I recently started testing using a LOCAL to capture and remove platform-specific false positive variants. Basically, we have exomes for about 1000 individuals and sometimes we find "novel" variants (i.e., absent from gnomad, 1KG, etc.) that have a relatively high AF in the cohort (> 10%), so I thought it would be useful to use this cohort VCF as an initial filter. This did help remove a lot of the false positives and about 1/5 of the genes had a higher rank when I did this, testing with a set of about 70 solved cases. However, some real variants are present in multiple individuals in the cohort so the allele frequency of these approaches ~0.2%, therefore about 1/5 of the cases had their known gene filtered out once the AF cutoff is applied for inheritance mode. I don't want to increase the filter for the inheritance mode to accommodate since I think 0.1% is a good AF to apply on a healthy population but it would be nice to be able to exclude LOCAL from this inheritance mode filtering step. Would that be possible? Alternatively, I can annotate the VCF with this cohort AF and filter it before sending to Exomiser but I thought it might be useful to others as well to have this option in Exomiser.

PabloBotas commented 4 years ago

Yes, I also think this would be useful natively from Exomiser

julesjacobsen commented 4 years ago

@oleraj @PabloBotas and @damiansm would you want to be able to define this manually, or expect the system to do this as a default? I'm guessing the latter as not everyone is likely to notice this and you'd need a large population for your local set, which is unlikely.

oleraj commented 4 years ago

Hi @julesjacobsen. If people are normally using LOCAL as a population-specific reference and want to use it for genetic model filtering then I wouldn't want to remove that capability. (And I might want to use it that way at some point as well.) So maybe making it defined manually would be the most flexible? But maybe I'm misunderstanding your comment.

julesjacobsen commented 4 years ago

@oleraj yes, that was what I was asking. The problem is how to define that you do or don't want to use it? If it's a non-user-alterable default, then this is easy. However the more flexibility you allow people the more difficult it is to use the system and the more likely people are to misuse the feature.

For example, do we do it like this:

    inheritanceModeFilter: {ignoreLocal: true}

or just say use only the specified sources

    inheritanceModeFilter: { 
        frequencySources: [
            THOUSAND_GENOMES,
            TOPMED,
            UK10K
        ]
    }

or specify the sources to be ignored (if any)

    inheritanceModeFilter: { 
        ignoreFrequencySources: [
            LOCAL
        ]
    }
oleraj commented 4 years ago

@julesjacobsen After some digging in the issues to see some of the thinking behind the LOCAL db, it seems like the default should be to use it for inheritance mode filtering and having a way like you're suggesting to ignore it in special cases would work. (The other option would be to do nothing and require users to update the whitelist file, but that could be tedious as they'll have to do that every time the exomiser database is updated.) Personally I like option 1 or 3.

PabloBotas commented 4 years ago

Hi, I would choose the 3rd option. It gives you the most flexibility and is compliant with a sensible default of using all available DBs for filtering. This means that there would be a inheritanceModes and a inheritanceModeFilter parameter. Not sure if you've prefer to put them together into a single parameter with Modes and IgnoreFrequencySources fields