exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
197 stars 54 forks source link

TAB delimited output format (tsv) for variants #28

Closed visze closed 9 years ago

visze commented 9 years ago

You wrote in the exomiser draft protocol that there is a TAB delimited file format. Right now there exists one for genes. I think if people using pipelines it will be great to have a TSV-file with the variants and all the annotations (still in the vcf-file annotations are missing).

I can start implementing this feature if it is OK with you.

damiansm commented 9 years ago

Sure but we prob want to spec what goes in it though. If we want all the annotations then people probably want to see all the phenotype matches from human, mouse and fish but this could be hard to fit into a simple TSV with one row per variant. It is definitely needed though. Been thinking an Excel output which is similar to the HTML output would probably be popular.

On Fri, Jan 23, 2015 at 1:35 PM, Max notifications@github.com wrote:

You wrote in the exomiser draft protocol that there is a TAB delimited file format. Right now there exists one for genes. I think if people using pipelines it will be great to have a TSV-file with the variants and all the annotations (still in the vcf-file annotations are missing).

I can start implementing this feature if it is OK with you.

— Reply to this email directly or view it on GitHub https://github.com/exomiser/Exomiser/issues/28.

visze commented 9 years ago

But you cannot easily use Excel for parsing.

We have an actual case, where we want to use exomiser within a study. It should run automatically in a pipeline. After running exomiser, the results should be parsed back into the main database of the study. Right now, we have to parse the html output. That's horrible. the VCF-Infoline is not completed (transcripts, functional classes and so on are missing). The easiest way will be a tsv format.

I think the fundamental questionis: What is the motive for running Exomiser locally? On my opinion there are two main reasons:

  1. Run it locally, beacuse the user do not want to upload VCFs (privacy issue).
  2. Run exomiser as part of a pipeline. So the variant results should be easily readably by a script.
damiansm commented 9 years ago

On Fri, Jan 23, 2015 at 3:06 PM, Max notifications@github.com wrote:

But you cannot easily use Excel for parsing.

I mean Excel as well as TSV of course. Excel for the clinicians - they have asked for it.

We have an actual case, where we want to use exomiser within a study. It should run automatically in a pipeline. After running exomiser, the results should be parsed back into the main database of the study. Right now, we have to parse the html output. That's horrible. the VCF-Infoline is not completed (transcripts, functional classes and so on are missing). The easiest way will be a tsv format.

Totally agree the TSV needs to include extra data. It was originally just developed for me and my benchmarking! Can we get this done for the Nature Protocols paper - ask Peter if timeframe is possible as I am away all next week! I have been thinking for a long time this is needed.

I think the fundamental questionis: What is the motive for running Exomiser locally? On my opinion there are two main reasons:

  1. Run it locally, beacuse the user do not want to upload VCFs (privacy issue).
  2. Run exomiser as part of a pipeline. So the variant results should be easily readably by a script.

— Reply to this email directly or view it on GitHub https://github.com/exomiser/Exomiser/issues/28#issuecomment-71206005.

visze commented 9 years ago

I also have a local version to annotate variants :-)

I have some time and can transfer it to the actual development version. The results will be look like (with some bugs):

#CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS TRANSCRIPT EXON BASE_CHANGE AA_CHANGE EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956;>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06) DBSNP_ID MAX_FREQUENCY DBSNP_FREQUENCY EVS_EA_FREQUENCY EVS_AA_FREQUENCY EXOMISER_VARIANT_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_GENE_COMBINED_SCORE
chr10 123243197 G A 65.25 Target 0/1 9 INTRONIC FGFR2:uc001lfg.4:intron10:c.1125+15C>T uc001lfg.4 intron10 c.1125+15C>T 0 FGFR2 0 0 0 0 0 0 0 0 0 0 1 1 0.9978677
chr10 123247670 C - 692.49 Target 0/1 31 INTRONIC FGFR2:uc001lfg.4:intron6:c.688-43G>- uc001lfg.4 intron6 c.688-43G>- 0 FGFR2 0 0 0 0 0 0 0 0 0 0 1 1 0.9978677
chr10 123256215 T G 100 PASS 0/1 0 MISSENSE FGFR2:uc001lfg.4:exon6:c.518A>C:p.E173A uc001lfg.4 exon6 c.518A>C p.E173A FGFR2 0.84209 0.998 1 0 rs121918506 0 0 0 0 1 1 1 0.9978677
chr9 94456643 G C 166.26 Frequency 0/1 10 UTR3 ROR2:uc004ari.1:c.*1C>G uc004ari.1 c.*1C>G 0 0 ROR2 0 0 0 0 rs923771 83.79 83.79 21.8797 18.3849 0 0.679293 0.81040007 0.7566463
chr9 94456705 T A 72.48 Target 0/1 7 INTRONIC ROR2:uc004ari.1:intron11:c.2066-12A>T uc004ari.1 intron11 c.2066-12A>T 0 ROR2 0 0 0 0 0 0 0 0 0 0 0.679293 0.81040007 0.7566463
chr9 94485928 C T 226.58 Frequency 0/1 16 UTR3 ROR2:uc004arj.2:c.*16G>A uc004arj.2 c.*16G>A 0 0 ROR2 0 0 0 0 rs2230578 79.29 79.29 30.6305 22.946 0 0.679293 0.81040007 0.7566463
chr9 94486321 C T 276.3 Frequency 0/1 36 MISSENSE ROR2:uc004arj.2:exon9:c.2455G>A:p.V819I uc004arj.2 exon9 c.2455G>A p.V819I ROR2 0 0 0 0 rs10761129 78.42 78.42 32.9805 23.9673 0 0.679293 0.81040007 0.7566463
chr9 94486381 G A 429.46 PASS 0/1 40 MISSENSE ROR2:uc004arj.2:exon9:c.2395C>T:p.P799S uc004arj.2 exon9 c.2395C>T p.P799S ROR2 0.4968 0.996 1 0.33 rs141235720 0.3372 0 0.3372 0.0908 0.81040007 0.679293 0.81040007 0.7566463
chr9 94487066 C T 298.58 Target 0/1 15 SYNONYMOUS ROR2:uc004ari.1:exon9:c.1290G>A:p.= uc004ari.1 exon9 c.1290G>A p.= ROR2 0 0 0 0 0 0 0 0 0 0 0.679293 0.81040007 0.7566463
chr9 94495608 T C 348.79 Frequency 0/1 22 MISSENSE ROR2:uc004ari.1:exon6:c.313A>G:p.T105A uc004ari.1 exon6 c.313A>G p.T105A ROR2 0 0 0 0 rs10820900 62.58 62.58 35.4186 25.6695 0 0.679293 0.81040007 0.7566463
chr9 94519645 G A 1573.06 Pathogenicity 0/1 112 UTR5 ROR2:uc004ari.1:c.-49C>T uc004ari.1 c.-49C>T 0 0 ROR2 0 0 0 0 rs145568368 0.4651 0.1377 0.4651 0.1362 0 0.679293 0.81040007 0.7566463
chr17 17697094 CAG - 398.05 PASS 0/1 33 NON_FS_DELETION RAI1:uc002grm.3:exon3:c.870_872del:p.Q291del uc002grm.3 exon3 c.870_872del p.Q291del RAI1 0 0 0 0 0 0 0 0 0 0.85 0.5879024 0.85 0.52203065
chr17 17697404 C T 730.68 PASS 0/1 52 MISSENSE RAI1:uc002grm.3:exon3:c.1142C>T:p.A381V uc002grm.3 exon3 c.1142C>T p.A381V RAI1 0.54361 0.696 1 0.04 rs113208290 0.5051 0.5051 0.4651 0.1589 0.7757377 0.5879024 0.85 0.52203065
chr17 10532884 G A 297.29 Target 0/1 21 INTRONIC MYH3:uc002gmq.2:intron40:c.5796+30C>T uc002gmq.2 intron40 c.5796+30C>T 0 MYH3 0 0 0 0 0 0 0 0 0 0 0.56803846 0.73 0.33966452
chr17 10533595 - T 957.27 Target 0/1 53 INTRONIC MYH3:uc002gmq.2:intron37:c.5457+10insA uc002gmq.2 intron37 c.5457+10insA 0 MYH3 0 0 0 0 0 0 0 0 0 0 0.56803846 0.73 0.33966452
chr17 10535429 TTTTG - 277.92 Target 1/1 6 INTRONIC MYH3:uc002gmq.2:intron34:c.4957-96CAAAA>- uc002gmq.2 intron34 c.4957-96CAAAA>- 0 MYH3 0 0 0 0 0 0 0 0 0 0 0.56803846 0.73 0.33966452
damiansm commented 9 years ago

Looks good to me. Make it so

sent from phone On Jan 23, 2015 3:42 PM, "Max" notifications@github.com wrote:

I also have a local version to annotate variants :-)

I have some time and can transfer it to the actual development version. The results will be look like (with some bugs):

CHROM POS REF ALT QUAL FILTER GENOTYPE COVERAGE FUNCTIONAL_CLASS HGVS

TRANSCRIPT EXON BASE_CHANGE AA_CHANGE EXOMISER_GENE CADD(>0.483) POLYPHEN(>0.956;>0.446) MUTATIONTASTER(>0.94) SIFT(<0.06) DBSNP_ID MAX_FREQUENCY DBSNP_FREQUENCY EVS_EA_FREQUENCY EVS_AA_FREQUENCY EXOMISER_VARIANT_SCORE EXOMISER_GENE_PHENO_SCORE EXOMISER_GENE_VARIANT_SCORE EXOMISER_GENE_COMBINED_SCORE chr10 123243197 G A 65.25 Target 0/1 9 INTRONIC FGFR2:uc001lfg.4:intron10:c.1125+15C>T uc001lfg.4 intron10 c.1125+15C>T 0 FGFR2 0 0 0 0 0 0 0 0 0 0 1 1 0.9978677 chr10 123247670 C - 692.49 Target 0/1 31 INTRONIC FGFR2:uc001lfg.4:intron6:c.688-43G>- uc001lfg.4 intron6 c.688-43G>- 0 FGFR2 0 0 0 0 0 0 0 0 0 0 1 1 0.9978677 chr10 123256215 T G 100 PASS 0/1 0 MISSENSE FGFR2:uc001lfg.4:exon6:c.518A>C:p.E173A uc001lfg.4 exon6 c.518A>C p.E173A FGFR2 0.84209 0.998 1 0 rs121918506 0 0 0 0 1 1 1 0.9978677 chr9 94456643 G C 166.26 Frequency 0/1 10 UTR3 ROR2:uc004ari.1:c._1C>G uc004ari.1 c._1C>G 0 0 ROR2 0 0 0 0 rs923771 83.79 83.79 21.8797 18.3849 0 0.679293 0.81040007 0.7566463 chr9 94456705 T A 72.48 Target 0/1 7 INTRONIC ROR2:uc004ari.1:intron11:c.2066-12A>T uc004ari.1 intron11 c.2066-12A>T 0 ROR2 0 0 0 0 0 0 0 0 0 0 0.679293 0.81040007 0.7566463 chr9 94485928 C T 226.58 Frequency 0/1 16 UTR3 ROR2:uc004arj.2:c._16G>A uc004arj.2 c._16G>A 0 0 ROR2 0 0 0 0 rs2230578 79.29 79.29 30.6305 22.946 0 0.679293 0.81040007 0.7566463 chr9 94486321 C T 276.3 Frequency 0/1 36 MISSENSE ROR2:uc004arj.2:exon9:c.2455G>A:p.V819I uc004arj.2 exon9 c.2455G>A p.V819I ROR2 0 0 0 0 rs10761129 78.42 78.42 32.9805 23.9673 0 0.679293 0.81040007 0.7566463 chr9 94486381 G A 429.46 PASS 0/1 40 MISSENSE ROR2:uc004arj.2:exon9:c.2395C>T:p.P799S uc004arj.2 exon9 c.2395C>T p.P799S ROR2 0.4968 0.996 1 0.33 rs141235720 0.3372 0 0.3372 0.0908 0.81040007 0.679293 0.81040007 0.7566463 chr9 94487066 C T 298.58 Target 0/1 15 SYNONYMOUS ROR2:uc004ari.1:exon9:c.1290G>A:p.= uc004ari.1 exon9 c.1290G>A p.= ROR2 0 0 0 0 0 0 0 0 0 0 0.679293 0.81040007 0.7566463 chr9 94495608 T C 348.79 Frequency 0/1 22 MISSENSE ROR2:uc004ari.1:exon6:c.313A>G:p.T105A uc004ari.1 exon6 c.313A>G p.T105A ROR2 0 0 0 0 rs10820900 62.58 62.58 35.4186 25.6695 0 0.679293 0.81040007 0.7566463 chr9 94519645 G A 1573.06 Pathogenicity 0/1 112 UTR5 ROR2:uc004ari.1:c.-49C>T uc004ari.1 c.-49C>T 0 0 ROR2 0 0 0 0 rs145568368 0.4651 0.1377 0.4651 0.1362 0 0.679293 0.81040007 0.7566463 chr17 17697094 CAG - 398.05 PASS 0/1 33 NON_FS_DELETION RAI1:uc002grm.3:exon3:c.870_872del:p.Q291del uc002grm.3 exon3 c.870_872del p.Q291del RAI1 0 0 0 0 0 0 0 0 0 0.85 0.5879024 0.85 0.52203065 chr17 17697404 C T 730.68 PASS 0/1 52 MISSENSE RAI1:uc002grm.3:exon3:c.1142C>T:p.A381V uc002grm.3 exon3 c.1142C>T p.A381V RAI1 0.54361 0.696 1 0.04 rs113208290 0.5051 0.5051 0.4651 0.1589 0.7757377 0.5879024 0.85 0.52203065 chr17 10532884 G A 297.29 Target 0/1 21 INTRONIC MYH3:uc002gmq.2:intron40:c.5796+30C>T uc002gmq.2 intron40 c.5796+30C>T 0 MYH3 0 0 0 0 0 0 0 0 0 0 0.56803846 0.73 0.33966452 chr17 10533595 - T 957.27 Target 0/1 53 INTRONIC MYH3:uc002gmq.2:intron37:c.5457+10insA uc002gmq.2 intron37 c.5457+10insA 0 MYH3 0 0 0 0 0 0 0 0 0 0 0.56803846 0.73 0.33966452 chr17 10535429 TTTTG

  • 277.92 Target 1/1 6 INTRONIC MYH3:uc002gmq.2:intron34:c.4957-96CAAAA>- uc002gmq.2 intron34 c.4957-96CAAAA>- 0 MYH3 0 0 0 0 0 0 0 0 0 0 0.56803846 0.73 0.33966452

— Reply to this email directly or view it on GitHub https://github.com/exomiser/Exomiser/issues/28#issuecomment-71211762.

julesjacobsen commented 9 years ago

commit: 70297041795c89f3582b8f72092c744971700957 commit: 854a2fb95dc6ed0fcb9b8450248336137437bd48 commit: f9ceac9345725aacc1c5c2d30acf758064f72746