Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
455 stars 152 forks source link

Allow Input of BND Structural Variants #441

Closed DarioS closed 1 year ago

DarioS commented 5 years ago

Some structural variant callers report variants in breakend format. This is currently not supported but I'd like to use variants from software such as GRIDSS in Effect Predictor.

helensch commented 5 years ago

Hi

Thank you for your query about breakend format support.

VEP supports the following structural variant types INS - insertion DEL - deletion DUP - duplication TDUP - tandem duplication

https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#sv

Please could you let us know what you would like to see in VEP for structural variants and breakend format (SVTYPE=BND).

Regards Helen

DarioS commented 5 years ago

For single breakends, it would be great to discover what unusual sequence they are joined to (e.g. LINE1 retrotransposon, Epstein-Barr Virus). For a breakend with a mate, perhaps just information about whether the breakend overlaps with any gene and what part of it (e.g. exon) is interesting.

sarahhunt commented 5 years ago

Hi @DarioS,

Thanks for the suggestions - we are aiming to improve SV support. Are you using command line VEP? Currently, if you submit a VCF BND formatted like this:

1 251748 gnomAD_v2_BND_1_4 N 432 UNRESOLVED END=258100;SVTYPE=BND

You get output like this:

CSQ=BND|non_coding_transcript_exon_variant&intron_variant|MODIFIER||ENSG00000228463|Transcript|ENST00000335577

So gene/transcript overlap information is available, but the consequence description is not precise - 'intron_variant' sounds too innocuous. We need to also assign something else - probably 'feature_truncation' - but this will need time to plan and implement.

What assembly are you using? It's easy to use annotations in BED or GFF in VEP analysis, as described here: https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html#custom_prep.

We don't currently export repeats, but there is a GRCh38 file, which can be indexed and read as a BED file, here: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/masking_coordinates.gz

I tried: gunzip masking_coordinates.gz grep -v "#" masking_coordinates | sort -k1,1 -k2,2n -k3,3n -t$'\t' | bgzip -c > masking_coordinates.bed.gz tabix -p bed masking_coordinates.bed.gz

vep -i input.vcf -offline --cache --dir_cache [location] -fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --custom masking_coordinates.bed.gz,repeat,bed,overlap

which appends a section like : repeat=MIR3,(TGCTCC)n,(TGG)n,L2a,L3,Plat_L3,MLT1K,MIR,L2a,MIR,L2a,MI

for each line. Obviously, the bed file could be filtered to remove the simple repeats.

We will look into a simpler, more automated way to provide information on overlaps with LINEs, Retroposon etc and let you know.

DarioS commented 5 years ago

I was toying with the web application for the hg38 reference genome. It seems that the software already does most of what I was thinking about.

osowiecki commented 5 years ago

MantaBND variants still get ignored in VEP 96.3

WARNING: start > end+1 : (START=62949878, END=60401253) on line 177 WARNING: start > end+1 : (START=56035852, END=872) on line 347 WARNING: start > end+1 : (START=57492555, END=10959) on line 349 WARNING: start > end+1 : (START=19746443, END=13390553) on line 496 WARNING: start > end+1 : (START=78943, END=68701) on line 692 WARNING: start > end+1 : (START=22975013, END=2264) on line 799 WARNING: start > end+1 : (START=177679, END=10685) on line 1013

ima23 commented 5 years ago

Hi @osowiecki,

Can you please let us know what your input was, a few MantaBND variants, and what the output you expected to get using VEP compared to what was reported? Thank you.

We are aiming to improve our SV support with larger changes expected next year and are interested to know the use cases.

Kind regards, Irina

osowiecki commented 5 years ago

Hi @osowiecki,

Can you please let us know what your input was, a few MantaBND variants, and what the output you expected to get using VEP compared to what was reported? Thank you.

We are aiming to improve our SV support with larger changes expected next year and are interested to know the use cases.

Kind regards, Irina

I will do that when I close the current project I work on. You can check any file produced by Manta and filtered for "PASS" flag. You can also see how snpEff handles these cases. I'll be back.

configManta.py --bam=bam/$sample.bam --referenceFasta=data/$genome/genome.fasta --runDir vcf/$sample/manta

vcf/$sample/manta/runWorkflow.py -m local -j $threads

gunzip -c vcf/$sample/manta/results/variants/diploidSV.vcf.gz > vcf/$sample/$sample_manta.vcf

bgzip vcf/$sample/$sample_manta.vcf
tabix -p vcf vcf/$sample/$sample_manta.vcf.gz

zless $sample_manta.vcf | sed 's/ID=AD,Number=./ID=AD,Number=R/' | vt decompose -s - | vt normalize -r data/$genome/genome.fasta - | java -Xmx15g -jar data/snpEff/snpEff.jar eff -classic -formatEff -v $snpeff_genome | bgzip -c > $sample_manta.vcf.gz tabix -p vcf vcf_ann/$sample/$sample_manta.vcf.gz mv snpEff_genes.txt snpEff/$sample_manta.txt mv snpEff_summary.html snpEff/$sample_manta.html

ima23 commented 5 years ago

Thank you @osowiecki for the workflow!

ea-ea commented 4 years ago

Hi @ima23 , I am working with VEP (web interface) and trying to annotate MANTA (v.1.6) output vcf containing BND tagged variants, and VEP throws error as not recognizing the input format. Input vcf format as follows:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE

chr1 3412098 MantaBND:2:0:1:0:0:0:1 A A]chr13:80221222] . PASS SVTYPE=BND;MATEID=MantaBND:2:0:1:0:0:0:0;IMPRECISE;CIPOS=-66,67;BND_PAIR_COUNT=2;PAIR_COUNT=2 . .

Can you help on what could be the problem in above example? When I have searched through the other use cases, I found that this type of vcf can be annotated by VEP. Can VEP accept translocations having info tag 'SVTYPE=BND'; and if it can, is there any specific formatting that we can try to use? Thanks.

ima23 commented 4 years ago

Hi @ea-ea,

[UPDATED] Apologies for the delay.
Replacing the ALT column value with . resulted in annotations both on VEP web interface and command line. https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=ylG7EFQY6Mm6untt-6722214

Will put in a ticket for future to work with the ALT column format A]chr13:80221222] too .

Kind regards, Irina

ea-ea commented 3 years ago

Hi @ima23,

Thank you very much for your reply. We are going to try changing ALT value with..

Meanwhile, I noticed that this error that we encountered is happening when we use candidateSV file outputted from MANTA-in which unscored SV and indel candidates present and no Format info exist, so that file is not recognized by VEP. Even though when we add format column with ./., it is not working. Is it not possible to annotate this candidate variants to see wether they are found in databases or not by VEP?

VCF file(DiploidSV) with GTs and scored BND (having ALT formatted similar to A]chr13:80221222]) , DUP, INS and DEL variants can be annotated by VEP without any problem.

Thanks

Akazhiel commented 3 years ago

Hello!

Has there been any progress on this matter? I'm trying to annotate BND structural variants that are formatted with an alt column like this [chr15:23440420[AG.

Cheers.

helensch commented 3 years ago

Hi @Akazhiel

Thank you for your query. There has not been an update for this and the ticket is on our backlog. I will check with the team on plans for BND SV updates and get back to you.

Regards Helen

amizeranschi commented 2 years ago

Hello, it would be great to see some progress with BND genotypes in Ensembl VEP. This still doesn't appear to be supported, as of v.105.

diegomscoelho commented 2 years ago

Hi @amizeranschi,

Unfortunately, we do not have an update for this implementation, but ticket still in our backlog. I will discuss with our team again to prioritize this feature, any update I will post it here.

Regards, @diegomscoelho

amizeranschi commented 2 years ago

Hi @diegomscoelho

Thank you for your reply. It's great to know you guys are still considering to add this feature. It would be great to see support for BND SVs in Ensembl VEP, because several SV calling tools can output variants in this format nowadays.

nuno-agostinho commented 1 year ago

Hi @DarioS, @osowiecki, @amizeranschi, @Akazhiel and @ea-ea,

VEP will support breakend variants in VCF files from release 110 (coming soon).

Feel free to open new tickets if you have any issues using the new feature.

Best regards, Nuno