seru71 commented 6 years ago

Hi,

After downloading exomiser 10.0.1 and 1802_hg19 dataset, I ran the NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38 example. Everything went fine, except that in the output several variant frequency annotations were missing. Here is the header for _AD.variants.tsv file:

CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956|>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06)

 REMM    DBSNP_ID        MAX_FREQUENCY   DBSNP_FREQUENCY EVS_EA_FREQUENCY        EVS_AA_FREQUENCY        EXAC_AFR_FREQ   EXAC_AMR_FREQ   EXAC_EAS_FREQ   EXAC_FIN_FREQ   EXAC_NFE_FREQ   EXAC_SAS_FREQ

EXAC_OTH_FREQ EXOMISER_VARIANT_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_GENE_COMBINED_SCORE CONTRIBUTING_VARIANT

GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?

Cheers,

julesjacobsen commented 6 years ago

GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?

No, you don't need to do that, they are part of the existing distribution. You can see them in the HTML output.

Given the inflexibility of TSV we're considering a new JSON output in the upcoming release which will contain the newer data sources.

seru71 commented 6 years ago

Thank you for the answer @julesjacobsen . Indeed, I can see them in the HTML output. So TSV output has only a subset of annotation columns present in HTML?

julesjacobsen commented 6 years ago

Correct, TSV doesn't contain all the data. How are you trying to use this? Is it part of an informatics pipeline or for display to clinicians? As I said previously we're looking at JSON as this is more amenable to having data added without breaking other people's parsers. What would be your preference?

seru71 commented 6 years ago

I have been trying it out attracted by the possibility of annotating variants with the REMM score. Looked at the tsv first, because it was easier to filter the variants there.

JSON is great for programmatic use, but not so convenient to manipulate using Unix shell. Having both would be awesome.

julesjacobsen commented 6 years ago

Do you just want the REMM score for a variant? If so tabix would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

visze commented 6 years ago

Maybe for annotating variants without prioritization jannovar might be a better choice.

Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).

Jules Jacobsen notifications@github.com schrieb am Fr., 20. Apr. 2018, 17:01:

Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .

DGMichael commented 6 years ago

Json would be a great output format for us.

On Apr 20, 2018, at 11:13 AM, Max notifications@github.com wrote:

Maybe for annotating variants without prioritization jannovar might be a better choice.

Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).

Jules Jacobsen notifications@github.com schrieb am Fr., 20. Apr. 2018, 17:01:

Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

seru71 commented 6 years ago

@julesjacobsen annotating with REMM wasn't the sole purpose. Also wanted to try it out on a few undiagnosed WGS samples where we're looking for some new clues.

I agree that using it to annotate variants with one score is an overkill. For annotation I have been using mostly Annovar, so I could easily convert REMM db into an Annovar annotation file. Using Exomiser I killed two birds with one stone:)

julesjacobsen commented 6 years ago

@seru71 Cool, that's exactly the right use-case! @DGMichael good to hear.

julesjacobsen commented 1 month ago

This is now possible with the new TSV_VARIANT output file:

#RANK	ID	GENE_SYMBOL	ENTREZ_GENE_ID	MOI	P-VALUE	EXOMISER_GENE_COMBINED_SCORE	EXOMISER_GENE_PHENO_SCORE	EXOMISER_GENE_VARIANT_SCORE	EXOMISER_VARIANT_SCORE	CONTRIBUTING_VARIANT	WHITELIST_VARIANT	VCF_ID	RS_ID	CONTIG	START	END	REF	ALT	CHANGE_LENGTH	QUAL	FILTER	GENOTYPE	FUNCTIONAL_CLASS	HGVS	EXOMISER_ACMG_CLASSIFICATION	EXOMISER_ACMG_EVIDENCE	EXOMISER_ACMG_DISEASE_ID	EXOMISER_ACMG_DISEASE_NAME	CLINVAR_VARIATION_ID	CLINVAR_PRIMARY_INTERPRETATION	CLINVAR_STAR_RATING	GENE_CONSTRAINT_LOEUF	GENE_CONSTRAINT_LOEUF_LOWER	GENE_CONSTRAINT_LOEUF_UPPER	MAX_FREQ_SOURCE	MAX_FREQ	ALL_FREQ	MAX_PATH_SOURCE	MAX_PATH	ALL_PATH
1	13-29233225-TC-T_AR	POMP	51371	AR	0.0000	0.9981	0.9960	1.0000	1.0000	1	1	null	rs112368783	13	29233225	29233226	TC	T	-1	100.0000	PASS	1\|1	upstream_gene_variant	POMP:ENST00000380842.4::	UNCERTAIN_SIGNIFICANCE	PP4,PP5	OMIM:601952	Keratosis linearis with ichthyosis congenita and sclerosing keratoderma	116	PATHOGENIC	1	0.6348	0.36	1.192	GNOMAD_G_NFE	0.012968486	GNOMAD_G_NFE=0.012968486	REMM	0.993	REMM=0.993

exomiser / Exomiser

exomiser does not output gnomAD and 1KG frequency annotations #258

CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956|>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06)