Closed dennishendriksen closed 6 days ago
Hi @dennishendriksen,
The results you are obtaining for that breakpoint variant seem incorrect.
In VEP 111, we represented the alternative allele of the breakpoint (in your case, [1:109650635[GG
) to indicate all potential consequences. However, this is confusing if a breakpoint is composed by two or more chromosomal breakends.
As such, in VEP 112, we now separate the consequences of a breakpoint variant for each breakend:
[1:109650635[GG
: consequences for the breakend located in chr1:109650635
.G
: consequences for the original breakend in position chr22:29767384
(represented as detailed in the VCF 4.4 standard, chapter 5.4.9: Single breakends)To answer your questions:
Q1: Is this intended? I would expect this field to always contain a ALT allele index.
Unfortunately, it seems that VEP 112 is returning nothing for the allele number for breakpoint variants. I am going to check how to fix it.
Q2: (...) Could you explain what the dot in the new output means?
The representation depicts a single breakend and its orientation:
2 321681 bndW G G.
: breakend occurring at position 321682 with at least position 321681 (and maybe 321680, 321679, etc.) attached13 123457 bndX A .A
: breakend occurring at position 123456 with at least position 123457 (and maybe 123458, 123459, etc.) attachedMore information at VCF 4.4 standard, chapter 5.4.9: Single breakends.
Q3: A last observation is that the number of consequences went down from 10 to 7. Could you explain this difference?
I'll also check if the changes are expected or not.
Thanks for reporting this issue! I'll report back as soon as possible.
Best regards, Nuno
Hey @dennishendriksen,
Just to update you: I opened PR Ensembl/ensembl-variation#1095 to fix allele numbers for breakends. This will be available in the next version of VEP.
Thanks again for reporting this issue!
Cheers, Nuno
Hey @dennishendriksen,
The bug fix to the allele number in breakpoint variants has now been merged to the code in the next version of VEP (VEP 113).
I will close this issue but feel free to open a new one if you find further issues or have any suggestions.
Cheers, Nuno
Hi @nuno-agostinho,
Thank you for this fix!
Q3: A last observation is that the number of consequences went down from 10 to 7. Could you explain this difference?
I'll also check if the changes are expected or not.
Did you get around to checking this?
Greetings, @dennishendriksen
Hi @dennishendriksen,
Sorry for closing the issue prematurely.
I was not able to replicate your results. Could you please send me the VEP command that you run to get those results?
Thanks, Nuno
Hi @nuno-agostinho,
From the previously attached vcf:
vep --allele_number --allow_non_variant --assembly GRCh38 --buffer_size 1000 --cache --compress_output bgzip --custom [PATH]/hg38.phyloP100way.bw,phyloP,bigwig,exact,0 --database 0 --dir_cache [PATH]/cache --dir_plugins [PATH]/plugins --dont_skip --exclude_predicted --fasta [PATH]/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --flag_pick_allele --fork 4 --format vcf --hgvs --input_file GRCh37_normalized.vcf.gz --no_stats --numbers --offline --output_file GRCh37_annotated.vcf.gz --plugin Grantham --plugin SpliceAI,snv=[PATH]/spliceai_scores.masked.snv.hg38.vcf.gz,indel=[PATH]/spliceai_scores.masked.indel.hg38.vcf.gz --plugin Capice,GRCh37_capice_output.tsv.gz --plugin UTRannotator,[PATH]/uORF_5UTR_PUBLIC.txt --plugin Inheritance,[PATH]/inheritance_20240115.tsv --plugin VKGL,[PATH]/vkgl_consensus_20240401.tsv,1 --plugin gnomAD,[PATH]/gnomad.total.v4.1.sites.stripped.tsv.gz --plugin ClinVar,[PATH]/clinvar_20240603_stripped.tsv.gz --plugin AnnotSV,GRCh37_normalized.vcf.gz.tsv,AnnotSV_ranking_score;AnnotSV_ranking_criteria;ACMG_class --plugin AlphScore,[PATH]/AlphScore_final_20230825_stripped_GRCh38.tsv.gz --plugin ncER,[PATH]/GRCh38_ncER_perc.bed.gz --plugin FATHMM_MKL_NC,[PATH]/GRCh38_FATHMM-MKL_NC.tsv.gz --plugin ReMM,[PATH]/GRCh38_ReMM.tsv.gz --polyphen s --pubmed --refseq --safe --shift_3prime --sift s --symbol --total_length --use_given_ref --vcf
Greetings, @dennishendriksen
Hey @dennishendriksen,
I am confused by your command, as you are mixing GRCh37 and GRCh38 data.
For GRCh38, the alternative breakend [1:109650635[G
should only return an intergenic variant[^1], whereas there are Transcript consequences if you use --assembly GRCh37
.
Could you check if the results make sense for you when using GRCh37 throughout the VEP command?
Thanks, Nuno
[^1]: However, the results only show results for the reference breakend (.G
). This is a bug, it should also show intergenic variants if there are no other consequences. I will try to fix this.
Hi @nuno-agostinho,
Apologies for the confusing filename, this is an artifact after liftover from GRCh37 to GRCh38. Both file content and command should be GRCh38. I'm not an expert on breakend notations, could it be that you missed the final G
in G>[1:109650635[GG
?
Greetings, @dennishendriksen
Hi @dennishendriksen,
could it be that you missed the final G in G>[1:109650635[GG?
Currently, the alternative sequence of a breakend is ignored by VEP. We intend to improve this in the future.
Upon further inspection, the difference may be related with updates to the Ensembl database. For instance, one of the consequences for the breakend [1:109650635[GG
in GRCh38 is associated with regulatory feature ENSR00001170488, which is not available in the current version of Ensembl.
If you want the same results as in VEP 111, you can download the previous VEP cache from http://ftp.ensembl.org/pub/release-111/variation/vep and then run VEP with option --db_version 111
. However, I would suggest to simply use the most up-to-date version of VEP cache when possible.
Hope this makes it clearer, but tell me if you want to discuss this further. Thanks!
Cheers, Nuno
Hi @nuno-agostinho,
Good to know that it is a change in database content (I had not thought on running VEP v112 with the 111 database). Case closed, thank you for your effort and time, greatly appreciated.
Cheers, @dennishendriksen
HI @dennishendriksen,
We are always here to help! Glad you reported the issue so that we could improve VEP.
Have a great day! 😄
Cheers, Nuno
Hello VEP team,
After updating VEP v111.0 to v112.0 one of our downstream tool crashes due to an empty string value in the
ALLELE_NUM
field. See for examplechr22:29767384 G>[1:109650635[GG
in GRCh37_annotated.vcf.gz.VEP v112
VEP v111
Q1: Is this intended? I would expect this field to always contain a ALT allele index.
Q2: In the images above you might also notice changes to Allele field values:
[1:109650635[GG
.G
Could you explain what the dot in the new output means?
Q3: A last observation is that the number of consequences went down from 10 to 7. Could you explain this difference?
Possibly these changes are related to the 'Enhanced Structural Variant Support' feature in v112?