Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
Apache License 2.0
456 stars 152 forks source link

Custom bigwig annotation not working for insertion variants #1740

Closed asalimih closed 3 months ago

asalimih commented 3 months ago

Describe the issue

vep doesn't output bigwig custom annotation values for insertion variants.

Additional information

I'm trying to annotate with the conservation scores bigwig file using --custom option but it doesn't output anything for insertion variants. I tried different types overlap, within, surrounding, exact but no change.
The point is if I manually change the insertion variant to a SNV variant at the exact position it will annotate it successfully. this means the problem is not with the position but the variant class.


Full VEP command line

./vep \
    --verbose --cache --offline --merged --species homo_sapiens --assembly GRCh38 \
    --use_given_ref \
    --tab \
    --force_overwrite \
    --dir /opt/vep/.vep \
    --dir_plugins /opt/vep/.vep/Plugins \
    --input_file /opt/vep/files/${inputVcf_file} \
    --output_file /opt/vep/files/${output_file} \
    --pick \
    --fasta /opt/vep/.vep/custom/references/Homo_sapiens_assembly38.fasta \
    --variant_class \
    --allele_number \
    --show_ref_allele \
    --total_length \
    --exclude_predicted \
    --fork ${annotationThreads} \
    --custom file=/opt/vep/.vep/custom/phyloP/hg38.phyloP100way.bw,short_name=phyloP100way,format=bigwig,type=overlap,coords=0

I use docker image ensemblorg/ensembl-vep:release_110.1

dglemos commented 3 months ago

Hi @asalimih, Can you please send an example of the input variants and the custom file?

asalimih commented 3 months ago

Hi @asalimih, Can you please send an example of the input variants and the custom file?

Hi @dglemos , Sure example.vcf.gz hg38.phyloP100way.bw there are two insertion variants in this vcf which don't get value.

asalimih commented 3 months ago

@dglemos , could you reproduce the issue?

dglemos commented 3 months ago

Unfortunately I cannot reproduce the issue.

Here is an example of my output using your input file example.vcf:

## VEP command-line: vep --allele_number --assembly GRCh38 --cache_version 112 --custom file=hg38.phyloP100way.bw,short_name=phyloP100way,format=bigwig,type=overlap,coords=0 --database 0 --dir_cache [PATH]/tabixconverted --exclude_predicted --fasta [PATH]/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --force_overwrite --input_file example.vcf --offline --output_file output.txt --pick --show_ref_allele --tab --total_length --variant_class
#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence     cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Existing_variation      ALLELE_NUM      REF_ALLELE      IMPACT  DISTANCE        STRAND  FLAGS   VARIANT_CLASS   SOURCE  phyloP100way
chr6_169663767_T_C      chr6:169663767  C       ENSG00000184465 ENST00000448612 Transcript      intron_variant  -       -       -       -       -       -       1       T       MODIFIER        -       -1      -       SNV     -       -0.84299999475479126
chr5_150203773_T_C      chr5:150203773  C       ENSG00000011083 ENST00000230671 Transcript      synonymous_variant      1460/3625       1194/1911       398/636 D       gaT/gaC -       1       T       LOW     -       1       -       SNV     -       0.707000017166137695
chr1_187744_A_G chr1:187744     G       ENSG00000279457 ENST00000623083 Transcript      intron_variant,non_coding_transcript_variant    -       -       -       -       -       -       1       A       MODIFIER        -       -1      -       SNV     -       -1.55999994277954102
chr1_1757145_T_TGGGGGGGGGG      chr1:1757145-1757146    GGGGGGGGGG      ENSG00000008130 ENST00000341426 Transcript      intron_variant  -       -       -       -       -       -       1       -       MODIFIER        -       -1      -       insertion       -       0.287000000476837158
chr1_1757145_T_G        chr1:1757145    G       ENSG00000008130 ENST00000341426 Transcript      intron_variant  -       -       -       -       -       -       1       T       MODIFIER        -       -1      -       SNV     -       -0.800000011920928955
chr2_219601622_G_GT     chr2:219601622-219601623        T       ENSG00000144589 ENST00000456909 Transcript      intron_variant  -       -       -       -       -       -       1       -       MODIFIER        -       1       -       insertion       -       0.97299998998641967

Can you please run VEP with the following options using the input example.vcf:

./vep \
    --verbose --cache --offline --merged --species homo_sapiens --assembly GRCh38 \
    --use_given_ref \
    --tab \
    --force_overwrite \
    --dir /opt/vep/.vep \
    --dir_plugins /opt/vep/.vep/Plugins \
    --input_file example.vcf \
    --output_file output.txt \
    --fasta /opt/vep/.vep/custom/references/Homo_sapiens_assembly38.fasta \
    --custom file=/opt/vep/.vep/custom/phyloP/hg38.phyloP100way.bw,short_name=phyloP100way,format=bigwig,type=overlap,coords=0
asalimih commented 3 months ago

Can you please run VEP with the following options using the input example.vcf:

./vep \
    --verbose --cache --offline --merged --species homo_sapiens --assembly GRCh38 \
    --use_given_ref \
    --tab \
    --force_overwrite \
    --dir /opt/vep/.vep \
    --dir_plugins /opt/vep/.vep/Plugins \
    --input_file example.vcf \
    --output_file output.txt \
    --fasta /opt/vep/.vep/custom/references/Homo_sapiens_assembly38.fasta \
    --custom file=/opt/vep/.vep/custom/phyloP/hg38.phyloP100way.bw,short_name=phyloP100way,format=bigwig,type=overlap,coords=0

I tried it. still not getting values for deletion variants. To give more information. I'm using docker image ensemblorg/ensembl-vep:release_110.1 . and when I run the code I get the following messages: ‍

Smartmatch is experimental at /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm line 472.
2024-08-19 10:04:50 - Ignored unsupported option 'pluginsdir=/plugins' from environment variable VEP_PLUGINSDIR
2024-08-19 10:04:50 - Ignored unsupported option 'no_htslib=1' from environment variable VEP_NO_HTSLIB
2024-08-19 10:04:50 - Ignored unsupported option 'no_plugins=1' from environment variable VEP_NO_PLUGINS
2024-08-19 10:04:50 - Set 'dir_plugins=/plugins' from environment variable VEP_DIR_PLUGINS
2024-08-19 10:04:50 - Ignored unsupported option 'no_update=1' from environment variable VEP_NO_UPDATE
2024-08-19 10:04:50 - Read configuration from environment variables
2024-08-19 10:04:50 - No input file format specified - detected vcf format
dglemos commented 3 months ago

2024-08-19 10:04:50 - Read configuration from environment variables

Can you share the environment variables?

asalimih commented 3 months ago

Can you share the environment variables?

Sure, here is the output of printenv inside the docker container:

dglemos commented 3 months ago

I'm sorry I didn't notice you were using version 110, using this version I can reproduce the issue. The result I sent you previously was run with the latest version 112. Could you please update your vep to use the latest version and test the command again?

asalimih commented 3 months ago

Could you please update your vep to use the latest version and test the command again?

Updating to version 112 solved the issue. Thanks