Open IanCodes opened 1 week ago
Hi @IanCodes, The column ZYG is populated by option --individual or --individual_zyg. I don't see any of those options in your command.
The option --everything
switches on the following options: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_everything this does not include the individuals options.
Let me know if you have more questions.
Best wishes, Diana
@dglemos Thank you very much for your reply Diana appears to be just what i need.
I'm glad it worked!
Best wishes, Diana
Apologies for the follow up, but I am observing that when I use '--individual_zyg all' I receive fewer lines of VEP output when using the same VCF file. Are there types of variant that are not processed using this flag?
Can you show me an example please?
Thank you for your fast response. I have with and without ZYG files, but they are big. What would be the best method of sharing them?
You can send your files to helpdesk@ebi.ac.uk or if they are too big to send by email, you can send a sample of the files.
I have extracted chr10 results for the VEP output with and without ZYG . Thank you. VEP_with_and_without_ZYG.zip
Thank you! Can you also send the input files?
Sorry for the delay here is the chr10 part of the VCF file (and headers). There were a number of plugins with huge files. No sure how well you'll be able to repeat my analysis. Let me know if you need anything else. Thank you. chr10.zip
The variants missing from the output with_ZYG_chr10.vep
have genotype HOMREF (homozygous reference).
An example:
chr10 47461 . G A 1.08712e-11 . AB=0;ABP=0;AC=0;AF=0;AN=2;AO=2;CIGAR=1X;DP=30;DPB=30;DPRA=0;EPP=3.0103;EPPR=10.7656;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=58.6786;NS=1;NUMALT=1;ODDS=26.7136;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=74;QR=1036;RO=28;RPL=2;RPP=7.35324;RPPR=5.80219;RPR=0;RUN=1;SAF=1;SAP=3.0103;SAR=1;SRF=12;SRP=4.25114;SRR=16;TYPE=snp;technology.ILLUMINA=1 GT:DP:AD:RO:QR:AO:QA:GL 0/0:30:28,2:28:1036:2:74:0,-2.00502,-86.2858
Using the option --individual_zyg these should still be in the output.
Can you try running vep again without extra options, something like this:
vep --offline --cache --dir_cache REDACTED_PATH/.conda/envs/VEP111/ --species homo_sapiens --tab --assembly GRCh38 -i <input_file> -o <output_file> --individual_zyg all
@dglemos I ran the command using the chr10.vep. HOMREF variants are present in the output.
e.g.
chr10_47461_G/G chr10:47461 G ENSG00000237297 ENST00000416477 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 577 1 -
Does this mean there is a conflict with one of the plugins?
Thanks for checking! I don't see how any of these plugins would interfere with the number of lines in the output.
Can you please try the following commands:
vep --offline --cache --dir_cache REDACTED_PATH/.conda/envs/VEP111/ --species homo_sapiens --tab --assembly GRCh38 -i <input_file> -o <output_file> --individual_zyg all --everything
vep \
--offline \
--cache \
--dir_cache REDACTED_PATH/.conda/envs/VEP111/ \
--species homo_sapiens \
--tab \
--assembly GRCh38 \
-i <input_file> \
-o <output_file> \
--individual_zyg all \
--plugin AlphaMissense,file=REDACTED_PATH/.conda/envs/VEP111/AlphaMissense_data/AlphaMissense_hg38.tsv.gz \
--plugin CADD,snv=REDACTED_PATH/.conda/envs/VEP111/CADD_data/whole_genome_SNVs.tsv.gz,indels=REDACTED_PATH/.conda/envs/VEP111/CADD_data/gnomad.genomes.r4.0.indel.tsv.gz,force_annotate=1 \
--plugin gnomADc,REDACTED_PATH/.conda/envs/VEP111/gnomad_data/gnomad.ch.genomesv3.tabbed.tsv.gz \
--plugin REVEL,file=REDACTED_PATH/.conda/envs/VEP111/REVEL_data/new_tabbed_revel_grch38.tsv.gz \
--plugin SpliceAI,snv=REDACTED_PATH/.conda/envs/VEP111/spliceai_data/spliceai_scores.raw.snv.hg38.vcf.gz,indel=REDACTED_PATH/.conda/envs/VEP111/spliceai_data/spliceai_scores.raw.indel.hg38.vcf.gz
Hello.
With only --individual_zyg I get 49509 lines in the VEP file With --individual_zyg all --everything I get 51653 lines With --individual_zyg all + plugins I get 49527 lines
Can you please send the output for chr10_47461_G/G
in all of those output files?
Thank you for you continuing effort!
--individual_zyg
chr10_47461_G/G chr10:47461 G ENSG00000237297 ENST00000416477 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 577 1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000561967 Transcript 3_prime_UTR_variant 1009 - - - - - 401LF_S113:HOMREF MODIFIER - -1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000562809 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 33 -1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000563456 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 244 -1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000564130 Transcript synonymous_variant 1869 829 277 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000567466 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 117 -1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000568584 Transcript synonymous_variant 989 931 311 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000568866 Transcript synonymous_variant 859 820 274 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 -
--individual_zyg all --everything
chr10_47461_G/G chr10:47461 G ENSG00000237297 ENST00000416477 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 577 1 - SNV - - - unprocessed_pseudogene YES - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000561967 Transcript 3_prime_UTR_variant 1009 - - - - - 401LF_S113:HOMREF MODIFIER - -1 - SNV TUBB8 HGNC HGNC:20773 protein_coding - - - 5 - - ENSP00000454878 - A0A075B724.50 UPI0001B790EC - 1 - - 4/4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000562809 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 33 -1 - SNV TUBB8 HGNC HGNC:20773 protein_coding - - - 5 - - ENSP00000456899 - A0A075B735.43 UPI0001B790ED - 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000563456 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 244 -1 - SNV TUBB8 HGNC HGNC:20773 retained_intron - - - 5 - - - - - - - 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000564130 Transcript synonymous_variant 1869 829 277 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 - SNV TUBB8 HGNC HGNC:20773 protein_coding - - - 5 - - ENSP00000457610 - Q5SQY0.149 UPI0000197C79 - 1 - - 4/4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000567466 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 117 -1 - SNV TUBB8 HGNC HGNC:20773 nonsense_mediated_decay - - - 5 - - ENSP00000454914 - A0A075B725.31 UPI0001B790EE - 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000568584 Transcript synonymous_variant 989 931 311 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 - SNV TUBB8 HGNC HGNC:20773 protein_coding YES NM_177987.3 - 1 P1 CCDS7051.1 ENSP00000456206 Q3ZCM7.150 - UPI000007238E - 1 - - 4/4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000568866 Transcript synonymous_variant 859 820 274 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 - SNV TUBB8 HGNC HGNC:20773 protein_coding - - - 5 - - ENSP00000457062 - A0A075B736.56 UPI000047C3D1 - 1 - - 3/3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
--individual_zyg all + plugins
chr10_47461_G/G chr10:47461 G ENSG00000237297 ENST00000416477 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 577 1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000561967 Transcript 3_prime_UTR_variant 1009 - - - - - 401LF_S113:HOMREF MODIFIER - -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000562809 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 33 -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000563456 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 244 -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000564130 Transcript synonymous_variant 1869 829 277 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000567466 Transcript downstream_gene_variant - - - - - - 401LF_S113:HOMREF MODIFIER 117 -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000568584 Transcript synonymous_variant 989 931 311 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
chr10_47461_G/G chr10:47461 G ENSG00000261456 ENST00000568866 Transcript synonymous_variant 859 820 274 L Cta/Cta - 401LF_S113:HOMREF LOW - -1 - - - - - 0.0001 0.9998 0.9977 0.9998 0.9393 0.7212 0.4243 0.0359 0.9998 29.4540 28.0000 2111922.0000 - -
The variant is in all outputs with the correct value 401LF_S113:HOMREF
, this indicates the option --individual_zyg
is behaving as expected.
For the different number of lines, the option --everything
swicthes on --regulatory
which reports if the variant overlaps regulatory regions.
Output example without --everything:
chr10_132898972_T/T chr10:132898972 T ENSG00000176769 ENST00000368642 Transcript intron_variant - - - - - -401LF_S113:HOMREF MODIFIER - -1 -
chr10_132898972_T/T chr10:132898972 T ENSG00000230098 ENST00000436942 Transcript downstream_gene_variant - - - - -401LF_S113:HOMREF MODIFIER 4932 1 -
chr10_132898972_T/T chr10:132898972 T ENSG00000176769 ENST00000483040 Transcript intron_variant,non_coding_transcript_variant - -401LF_S113:HOMREF MODIFIER - -1 -
Output example with --everything:
chr10_132898972_T/T chr10:132898972 T ENSG00000176769 ENST00000368642 Transcript intron_variant - - - - - -401LF_S113:HOMREF MODIFIER - -1 - SNV TCERG1L HGNC 23533 protein_coding YES - - - - -CCDS7662.2 ENSP00000357631 TCRGL_HUMAN - UPI00004589C8 - - - - - 10/11 - - - - --
chr10_132898972_T/T chr10:132898972 T ENSG00000230098 ENST00000436942 Transcript downstream_gene_variant - - - - -401LF_S113:HOMREF MODIFIER 4932 1 - SNV TCERG1L-AS1 HGNC 49532 antisense YES - - - --
chr10_132898972_T/T chr10:132898972 T ENSG00000176769 ENST00000483040 Transcript intron_variant,non_coding_transcript_variant - -401LF_S113:HOMREF MODIFIER - -1 - SNV TCERG1L HGNC 23533 retained_intron - - - - - -10/11 - - - - - - - - - - - - - - - - - - --
chr10_132898972_T/T chr10:132898972 T - ENSR00000993699 RegulatoryFeature regulatory_region_variant - - - -401LF_S113:HOMREF MODIFIER - - - SNV - - - enhancer - - - - - -
As you can see the last example has one more line because the variant overlaps a regulatory region.
Thank you for your help. Unfortunately it doesn't solve my problem. The original run used --everything and plugins. The expectation was that adding '--individual_zyg all' would just add another field to the output. It does, but lines of output are missing. So, for some variants --individual_zyg must be causing some difference.
I cannot reproduce the issue. Can you send an example of a variant with missing data or missing from the output?
With only --individual_zyg I get 49509 lines in the VEP file With --individual_zyg all --everything I get 51653 lines With --individual_zyg all + plugins I get 49527 lines
Using this example, what are the counts when you run --individual_zyg all
+ --everything
+ plugins
These are the tallies of the various run. The numbers are a little different from previous that included the header lines. I 'll need to get back to you on the first part.
49467 --individual_zyg all 49467 --individual_zyg all --plugins 51551 --individual_zyg all --everything 51551 --individual_zyg all --everything --plugins 51587 --everything 51587 --everything --plugins
Hello,
I have been using VEP V111 to annotated Freebayes VCF files. We have noticed that the ZYG field is missing from the output. Is this expected?
The command line for VEP was:
qsub -pe smp.pe 4 -V -cwd -N vep_fb_005SN_S25 -b y 'vep --offline --cache --dir_cache REDACTED_PATH/.conda/envs/VEP111/ --species homo_sapiens --dir_plugins REDACTED_PATH/.vep/Plugins/ --everything --tab --assembly GRCh38 -i 005SN_S25_hg38_freebayes136_MAPQ20_QUAL20_COV10_controls_subtracted.vcf.gz -o 005SN_S25_hg38_freebayes136_MAPQ20_QUAL20_COV10_controls_subtracted_v111.vep --force_overwrite --fork 4 --plugin AlphaMissense,file=REDACTED_PATH/.conda/envs/VEP111/AlphaMissense_data/AlphaMissense_hg38.tsv.gz --plugin CADD,snv=REDACTED_PATH/.conda/envs/VEP111/CADD_data/whole_genome_SNVs.tsv.gz,indels=REDACTED_PATH/.conda/envs/VEP111/CADD_data/gnomad.genomes.r4.0.indel.tsv.gz,force_annotate=1 --plugin gnomADc,REDACTED_PATH/.conda/envs/VEP111/gnomad_data/gnomad.ch.genomesv3.tabbed.tsv.gz --plugin REVEL,file=REDACTED_PATH/.conda/envs/VEP111/REVEL_data/new_tabbed_revel_grch38.tsv.gz --plugin SpliceAI,snv=REDACTED_PATH/.conda/envs/VEP111/spliceai_data/spliceai_scores.raw.snv.hg38.vcf.gz,indel=REDACTED_PATH/.conda/envs/VEP111/spliceai_data/spliceai_scores.raw.indel.hg38.vcf.gz'
An example of the VCF input follows.
Thank you, Ian