exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
197 stars 54 forks source link

exomiser does not output gnomAD and 1KG frequency annotations #258

Closed seru71 closed 3 weeks ago

seru71 commented 6 years ago

Hi,

After downloading exomiser 10.0.1 and 1802_hg19 dataset, I ran the NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38 example. Everything went fine, except that in the output several variant frequency annotations were missing. Here is the header for _AD.variants.tsv file:

CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956|>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06)

 REMM    DBSNP_ID        MAX_FREQUENCY   DBSNP_FREQUENCY EVS_EA_FREQUENCY        EVS_AA_FREQUENCY        EXAC_AFR_FREQ   EXAC_AMR_FREQ   EXAC_EAS_FREQ   EXAC_FIN_FREQ   EXAC_NFE_FREQ   EXAC_SAS_FREQ

EXAC_OTH_FREQ EXOMISER_VARIANT_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_GENE_COMBINED_SCORE CONTRIBUTING_VARIANT

GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?

Cheers,

julesjacobsen commented 6 years ago

GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?

No, you don't need to do that, they are part of the existing distribution. You can see them in the HTML output.

Given the inflexibility of TSV we're considering a new JSON output in the upcoming release which will contain the newer data sources.

seru71 commented 6 years ago

Thank you for the answer @julesjacobsen . Indeed, I can see them in the HTML output. So TSV output has only a subset of annotation columns present in HTML?

julesjacobsen commented 6 years ago

Correct, TSV doesn't contain all the data. How are you trying to use this? Is it part of an informatics pipeline or for display to clinicians? As I said previously we're looking at JSON as this is more amenable to having data added without breaking other people's parsers. What would be your preference?

seru71 commented 6 years ago

I have been trying it out attracted by the possibility of annotating variants with the REMM score. Looked at the tsv first, because it was easier to filter the variants there.

JSON is great for programmatic use, but not so convenient to manipulate using Unix shell. Having both would be awesome.

julesjacobsen commented 6 years ago

Do you just want the REMM score for a variant? If so tabix would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

visze commented 6 years ago

Maybe for annotating variants without prioritization jannovar might be a better choice.

Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).

Jules Jacobsen notifications@github.com schrieb am Fr., 20. Apr. 2018, 17:01:

Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .

DGMichael commented 6 years ago

Json would be a great output format for us.

On Apr 20, 2018, at 11:13 AM, Max notifications@github.com wrote:

Maybe for annotating variants without prioritization jannovar might be a better choice.

Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).

Jules Jacobsen notifications@github.com schrieb am Fr., 20. Apr. 2018, 17:01:

Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

seru71 commented 6 years ago

@julesjacobsen annotating with REMM wasn't the sole purpose. Also wanted to try it out on a few undiagnosed WGS samples where we're looking for some new clues.

I agree that using it to annotate variants with one score is an overkill. For annotation I have been using mostly Annovar, so I could easily convert REMM db into an Annovar annotation file. Using Exomiser I killed two birds with one stone:)

julesjacobsen commented 6 years ago

@seru71 Cool, that's exactly the right use-case! @DGMichael good to hear.

julesjacobsen commented 3 weeks ago

This is now possible with the new TSV_VARIANT output file:

#RANK ID GENE_SYMBOL ENTREZ_GENE_ID MOI P-VALUE EXOMISER_GENE_COMBINED_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_VARIANT_SCORE CONTRIBUTING_VARIANT WHITELIST_VARIANT VCF_ID RS_ID CONTIG START END REF ALT CHANGE_LENGTH QUAL FILTER GENOTYPE FUNCTIONAL_CLASS HGVS EXOMISER_ACMG_CLASSIFICATION EXOMISER_ACMG_EVIDENCE EXOMISER_ACMG_DISEASE_ID EXOMISER_ACMG_DISEASE_NAME CLINVAR_VARIATION_ID CLINVAR_PRIMARY_INTERPRETATION CLINVAR_STAR_RATING GENE_CONSTRAINT_LOEUF GENE_CONSTRAINT_LOEUF_LOWER GENE_CONSTRAINT_LOEUF_UPPER MAX_FREQ_SOURCE MAX_FREQ ALL_FREQ MAX_PATH_SOURCE MAX_PATH ALL_PATH
1 13-29233225-TC-T_AR POMP 51371 AR 0.0000 0.9981 0.9960 1.0000 1.0000 1 1 null rs112368783 13 29233225 29233226 TC T -1 100.0000 PASS 1|1 upstream_gene_variant POMP:ENST00000380842.4:: UNCERTAIN_SIGNIFICANCE PP4,PP5 OMIM:601952 Keratosis linearis with ichthyosis congenita and sclerosing keratoderma 116 PATHOGENIC 1 0.6348 0.36 1.192 GNOMAD_G_NFE 0.012968486 GNOMAD_G_NFE=0.012968486 REMM 0.993 REMM=0.993