Closed seru71 closed 1 month ago
GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?
No, you don't need to do that, they are part of the existing distribution. You can see them in the HTML output.
Given the inflexibility of TSV we're considering a new JSON output in the upcoming release which will contain the newer data sources.
Thank you for the answer @julesjacobsen . Indeed, I can see them in the HTML output. So TSV output has only a subset of annotation columns present in HTML?
Correct, TSV doesn't contain all the data. How are you trying to use this? Is it part of an informatics pipeline or for display to clinicians? As I said previously we're looking at JSON as this is more amenable to having data added without breaking other people's parsers. What would be your preference?
I have been trying it out attracted by the possibility of annotating variants with the REMM score. Looked at the tsv first, because it was easier to filter the variants there.
JSON is great for programmatic use, but not so convenient to manipulate using Unix shell. Having both would be awesome.
Do you just want the REMM score for a variant? If so tabix would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.
Maybe for annotating variants without prioritization jannovar might be a better choice.
Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).
Jules Jacobsen notifications@github.com schrieb am Fr., 20. Apr. 2018, 17:01:
Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .
Json would be a great output format for us.
On Apr 20, 2018, at 11:13 AM, Max notifications@github.com wrote:
Maybe for annotating variants without prioritization jannovar might be a better choice.
Jannovar can annotate several other sources like dbNSFP. ReMM directly is not implemented yet but, if needed, I can easily add this function. Becaus ReMM just needs the position in the genome it is always the fastest to use directly tabix (without any alt allele comparison which will be needed if you use CADD for example).
Jules Jacobsen notifications@github.com schrieb am Fr., 20. Apr. 2018, 17:01:
Do you just want the REMM score for a variant? If so tabix http://www.htslib.org/doc/tabix.html would be a better choice than running the whole of exomiser. Running exomiser just to annotate variants isn't really what it was designed to do as it will take a lot of time and RAM to do this.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/258#issuecomment-383124793, or mute the thread https://github.com/notifications/unsubscribe-auth/AI1nsGBLfVRerqf1Pt3kS9agWPP_s6iKks5tqfhLgaJpZM4S6yF7 .
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@julesjacobsen annotating with REMM wasn't the sole purpose. Also wanted to try it out on a few undiagnosed WGS samples where we're looking for some new clues.
I agree that using it to annotate variants with one score is an overkill. For annotation I have been using mostly Annovar, so I could easily convert REMM db into an Annovar annotation file. Using Exomiser I killed two birds with one stone:)
@seru71 Cool, that's exactly the right use-case! @DGMichael good to hear.
This is now possible with the new TSV_VARIANT output file:
#RANK | ID | GENE_SYMBOL | ENTREZ_GENE_ID | MOI | P-VALUE | EXOMISER_GENE_COMBINED_SCORE | EXOMISER_GENE_PHENO_SCORE | EXOMISER_GENE_VARIANT_SCORE | EXOMISER_VARIANT_SCORE | CONTRIBUTING_VARIANT | WHITELIST_VARIANT | VCF_ID | RS_ID | CONTIG | START | END | REF | ALT | CHANGE_LENGTH | QUAL | FILTER | GENOTYPE | FUNCTIONAL_CLASS | HGVS | EXOMISER_ACMG_CLASSIFICATION | EXOMISER_ACMG_EVIDENCE | EXOMISER_ACMG_DISEASE_ID | EXOMISER_ACMG_DISEASE_NAME | CLINVAR_VARIATION_ID | CLINVAR_PRIMARY_INTERPRETATION | CLINVAR_STAR_RATING | GENE_CONSTRAINT_LOEUF | GENE_CONSTRAINT_LOEUF_LOWER | GENE_CONSTRAINT_LOEUF_UPPER | MAX_FREQ_SOURCE | MAX_FREQ | ALL_FREQ | MAX_PATH_SOURCE | MAX_PATH | ALL_PATH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 13-29233225-TC-T_AR | POMP | 51371 | AR | 0.0000 | 0.9981 | 0.9960 | 1.0000 | 1.0000 | 1 | 1 | null | rs112368783 | 13 | 29233225 | 29233226 | TC | T | -1 | 100.0000 | PASS | 1|1 | upstream_gene_variant | POMP:ENST00000380842.4:: | UNCERTAIN_SIGNIFICANCE | PP4,PP5 | OMIM:601952 | Keratosis linearis with ichthyosis congenita and sclerosing keratoderma | 116 | PATHOGENIC | 1 | 0.6348 | 0.36 | 1.192 | GNOMAD_G_NFE | 0.012968486 | GNOMAD_G_NFE=0.012968486 | REMM | 0.993 | REMM=0.993 |
Hi,
After downloading exomiser 10.0.1 and 1802_hg19 dataset, I ran the NA19722_601952_AUTOSOMAL_RECESSIVE_POMP_13_29233225_5UTR_38 example. Everything went fine, except that in the output several variant frequency annotations were missing. Here is the header for _AD.variants.tsv file:
CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956|>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06)
EXAC_OTH_FREQ EXOMISER_VARIANT_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_GENE_COMBINED_SCORE CONTRIBUTING_VARIANT
GNOMAD, 1KG, UK10K are specified in the YAML file, but missing from the output. Should I download these frequency databases separately?
Cheers,