Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
437 stars 149 forks source link

Improper calculation of SV size for SVs cross different chromosomes #1637

Open gudeqing opened 3 months ago

gudeqing commented 3 months ago

Describe the issue

VEP warns " too long to annotate" for SVs cross different chromosomes because of improper SV size calculation (which I believe is not suitable for SV found inter chromosomes). And I wish there should be somehow a more reasonable way for BreakPoint annotation which will be helpful for fusion gene identification. Thanks!

The following is an SV example record of ETCHING, which will prompt warning of "WARNING: variant ETCH_SV_2903 on line 3 is too long to annotate: (128516188)": image

Additional information

Please fill in the following sections to help us find the source of your issue as quickly as possible.

System

Full VEP command line

docker run --rm --privileged -m 53687091200b --cpus 12 --user `id -u`:`id -g` -i --entrypoint /bin/bash --name VEP-Etching -v *:* ensemblorg/ensembl-vep:release_110.1 cmd.sh

$ cat cmd.sh 
set -o pipefail
vep  -i /data/*/B260_Raji_fusion/Result_hg19/SortVcf-RAJI1/RAJI1.etching.sorted.vcf --format vcf --fasta /home/hxbio04/hg19/genome.fa -o RAJI1.vep.vcf.gz --vcf --compress_output bgzip --force_overwrite  --fork 4 --species homo_sapiens --assembly GRCh37 --dir_cache /home/hxbio04/dbs/vep --stats_file RAJI1.vep.summary.html --max_sv_size 50000000 --cache  --offline  --refseq  --variant_class  --sift b --polyphen b --nearest transcript --gene_phenotype  --regulatory  --phased  --numbers  --hgvs  --transcript_version  --symbol  --dont_skip --tsl  --canonical  --mane  --biotype  --max_af  --af_1kg  --af_gnomad  --flag_pick  --custom "file=/home/hxbio04/dbs/vep/CIViC/20230901.nightly-civic_accepted.sorted.vcf.gz,short_name=Civic,format=vcf,type=exact,fields=CSQ" && tabix *vcf.gz

Full error message

WARNING: variant ETCH_SV_299 on line 1 is too long to annotate: (60670482) WARNING: variant ETCH_SV_2827 on line 2 is too long to annotate: (68692399) WARNING: variant ETCH_SV_2903 on line 3 is too long to annotate: (128516188) WARNING: variant ETCH_SV_3003 on line 4 is too long to annotate: (186783170) WARNING: variant ETCH_SV_3035 on line 6 is too long to annotate: (86375240) .......

Data files (if applicable)

They include:

likhitha-surapaneni commented 3 months ago

Hi @gudeqing , sorry to hear that you are facing an issue.

Unfortunately, I am not able to reproduce the issue on my end. Can you kindly provide us with the input and custom file used? A sample of the files should also be fine.

Thank you.

gudeqing commented 2 months ago

@likhitha-surapaneni Sorry for the late reply. Here is the vcf file: RAJI5.etching.sorted.vcf.gz

likhitha-surapaneni commented 2 months ago

Hi @gudeqing,

Thank you for providing the input file. There have been some updates in release/111. Can you please try running with release/111 and let us know if you are still facing issues?