Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
138 stars 115 forks source link

MaveDB went wrong #659

Closed bwubb closed 10 months ago

bwubb commented 10 months ago

Greetings,

I am having trouble getting anything our of the MaveDB plugin. I definitely have the companion file, and vep is running as normal as far as I can tell, except I get the Warning:

WARNING: Plugin 'MaveDB' went wrong: Can't locate object method "get_reference_TranscriptVariationAllele" via package "Bio::EnsEMBL::Variation::RegulatoryFeatureVariation" at /home/bwubb/.vep/Plugins/MaveDB.pm line 187, <$fh> line 22304.

No Mave annotations will appear in the annotation.

dglemos commented 10 months ago

Hi @bwubb, Can you please send the VEP command you are running and an example of the input file?

bwubb commented 10 months ago

Certainly. Thank you for the help.

Vcf input file looks standard.

[bwubb@node197 CAG_TWIST-WES-v2]$ bcftools view data/work/SE0000815348/deep_variant.output.vcf.gz | grep -v "#" | head -10
1       69270   .       A       G       27.4    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:20:7:0,7:1:27,21,0
1       69511   .       A       G       28.1    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:13:50:0,50:1:27,12,0
1       69897   .       T       C       14.6    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:14:4:0,4:1:14,22,0
1       939460  .       G       A       7.9     PASS    .       GT:GQ:DP:AD:VAF:PL      0/1:6:110:48,61:0.554545:6,0,10
1       942451  .       T       C       33.2    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:18:104:0,104:1:33,17,0
1       944296  .       G       A       37.9    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:23:11:0,11:1:37,22,0
1       944307  .       T       C       36.2    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:21:14:0,14:1:36,21,0
1       946247  .       G       A       12.3    PASS    .       GT:GQ:DP:AD:VAF:PL      0/1:11:106:45,61:0.575472:11,0,14
1       952421  .       A       G       31.1    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:17:86:0,86:1:30,17,0
1       953259  .       T       C       33.5    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:20:113:0,112:0.99115:33,20,0

The VEP command is something like:

vep -i data/work/SE0000815348/deep_variant.output.vcf.gz
        -o data/work/SE0000815348/deep_variant.output.vep.vcf
        --force_overwrite 
        --offline
         --cache
         --format vcf
         --vcf
         --everything
         --canonical
         --assembly GRCh38
         --species homo_sapiens
         --fasta /home/bwubb/resources/Genomes/Human/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
         --vcf_info_field ANN
         --plugin NMD
         --plugin REVEL,/home/bwubb/.vep/revel/revel_grch38.tsv.gz
         --plugin SpliceAI,snv=/home/bwubb/.vep/spliceai/spliceai_scores.raw.snv.hg38.vcf.gz,indel=/home/bwubb/.vep/spliceai/spliceai_scores.raw.indel.hg38.vcf.gz
         --plugin gnomADc,/home/bwubb/.vep/gnomAD/gnomad.v3.1.1.hg38.genomes.gz
         --plugin UTRannotator,/home/bwubb/.vep/Plugins/UTRannotator/uORF_5UTR_GRCh38_PUBLIC.txt
         --custom /home/bwubb/.vep/clinvar/vcf_GRCh38/clinvar.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN
         --plugin AlphaMissense,file=$HOME/.vep/alphamissense/AlphaMissense_GRCh38.tsv.gz
         --plugin MaveDB,file=$HOME/.vep/mavedb/MaveDB_variants.tsv.gz

I have not modified the MaveDB resource file in anyway; only downloaded from the listed source.

dglemos commented 10 months ago

Thanks for sending more details. Unfortunately, I can't reproduce the issue with the variants you provided. Can you please send the full input file or the variants that cause the plugin to throw the warning?

dglemos commented 10 months ago

For plugin AlphaMissense, if I run a job with your variants my output includes AlphaMissense data for some of them.

1_939460_G/A    1:939460        A       ENSG00000187634 ENST00000342066 Transcript      missense_variant,splice_region_variant  796     706     236     D/N     Gat/Aat -       IMPACT=MODERATE;STRAND=1;am_class=likely_benign;am_pathogenicity=0.2471
1_942451_T/C    1:942451        C       ENSG00000187634 ENST00000342066 Transcript      missense_variant        1117    1027    343     W/R     Tgg/Cgg -       IMPACT=MODERATE;STRAND=1;am_class=likely_benign;am_pathogenicity=0.0292

I noticed you only use $HOME for MaveDB and AlphaMissense files. Can you please re-run VEP without $HOME in the file's path? Instead, point directly to the files as you do for the other plugins.

bwubb commented 10 months ago

Hi, @dglemos

Thank you for the reply. You can close it if you would like; Im not certain if it is the same issue, but I am the same author.

I put the full path to all of the files. I still receive no am_class or am_pathogenicity from VEP. In the two line example output you provided back, are you able to output in vcf?

Perhaps the annotations are there and its the way Im parsing them from vcf? Im looking for the 'am_class' in each line as a key to a key_value pair.

bwubb commented 10 months ago

In regards to MaveDB, I did not realize it was reporting the specific lines of the vcf input. I only get a few lines of warning but they are:

10      48005   .       G       A       12      PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:5:117:32,85:0.726496:10,4,0
12      138734  .       G       A       11.6    PASS    .       GT:GQ:DP:AD:VAF:PL      0/1:9:82:47,35:0.426829:11,0,11
18      252559  .       G       A       30.8    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:17:66:0,66:1:30,16,0
20      145669  .       ACC     A       12.9    PASS    .       GT:GQ:DP:AD:VAF:PL      0/1:12:84:44,40:0.47619:12,0,20

These were pulled using 'zcat file.vcf.gz | head -#' so Im not really sure if the plugin is complaining about a specific line or a specific variant.

I am certainly willing to provide the full vcf for troubleshooting purposes. Or rerun altered commands to mimic what you are able to produce.

dglemos commented 10 months ago

Yes, the AlphaMissense data is in my output. The VCF format does not have the key-value pair in the output line - only in the header.

This is how the data is in the VCF output: 1 939460 . G A 7.9 PASS CSQ=A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|Transcript|ENST00000327044|protein_coding|||||||||||4743|-1||HGNC|HGNC:24517||||||,A|intron_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000341065|protein_coding||5/11||||||||||1|cds_start_NF|HGNC|HGNC:28706||||||,A|missense_variant&splice_region_variant|MODERATE|SAMD11|ENSG00000187634|Transcript|ENST00000342066|protein_coding|7/14||||796|706|236|D/N|Gat/Aat|||1||HGNC|HGNC:28706|||||likely_benign|0.2471,A|downstream_gene_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000437963|protein_coding|||||||||||3667|1|cds_end_NF|HGNC|HGNC:28706||||||,A|missense_variant&splice_region_variant|MODERATE|SAMD11|ENSG00000187634|Transcript|ENST00000455979|protein_coding|1/7||||186|187|63|D/N|Gat/Aat|||1|cds_start_NF|HGNC|HGNC:28706||||||,A|upstream_gene_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000464948|retained_intron|||||||||||2706|1||HGNC|HGNC:28706||||||,A|upstream_gene_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000466827|retained_intron|||||||||||2643|1||HGNC|HGNC:28706||||||,A|upstream_gene_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000474461|retained_intron|||||||||||1616|1||HGNC|HGNC:28706||||||,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|Transcript|ENST00000477976|retained_intron|||||||||||4745|-1||HGNC|HGNC:24517||||||,A|upstream_gene_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000478729|protein_coding_CDS_not_defined|||||||||||886|1||HGNC|HGNC:28706||||||,A|downstream_gene_variant|MODIFIER|NOC2L|ENSG00000188976|Transcript|ENST00000483767|retained_intron|||||||||||4744|-1||HGNC|HGNC:24517||||||,A|intron_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000616016|protein_coding||7/13||||||||||1||HGNC|HGNC:28706||||||,A|intron_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000616125|protein_coding||6/10||||||||||1|cds_start_NF|HGNC|HGNC:28706||||||,A|missense_variant&splice_region_variant|MODERATE|SAMD11|ENSG00000187634|Transcript|ENST00000617307|protein_coding|6/13||||709|709|237|D/N|Gat/Aat|||1|cds_start_NF|HGNC|HGNC:28706||||||,A|intron_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000618181|protein_coding||5/9||||||||||1|cds_start_NF|HGNC|HGNC:28706||||||,A|intron_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000618323|protein_coding||7/13||||||||||1||HGNC|HGNC:28706||||||,A|intron_variant|MODIFIER|SAMD11|ENSG00000187634|Transcript|ENST00000618779|protein_coding||5/11||||||||||1|cds_start_NF|HGNC|HGNC:28706||||||,A|missense_variant&splice_region_variant|MODERATE|SAMD11|ENSG00000187634|Transcript|ENST00000622503|protein_coding|6/13||||709|709|237|D/N|Gat/Aat|||1|cds_start_NF|HGNC|HGNC:28706|||||| GT:GQ:DP:AD:VAF:PL 0/1:6:110:48,61:0.554545:6,0,10

To be able to parse the data you follow the order from the header: ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|MaveDB_nt|MaveDB_pro|MaveDB_score|MaveDB_urn|am_class|am_pathogenicity">

The variant above has MaveDB output likely_benign|0.2471 for transcript ENST00000342066.

dglemos commented 10 months ago

For the MaveDB plugin, can you just run the variants you posted in the comment above? VEP does not throw any warning/error when I run only those.

bwubb commented 10 months ago

Sorry, my previous comment about key value pairs was incorrect. I reviewed my python code and I am accurately pulling the order of annotations from the CSQ field. Im using vcfpy methods so Im fairly confident it is working as intended... but I will make sure of this, its suspicious the two "newer" are the only ones not working.

I will make a test vcf of the variants I posted above and work on them to see if I match your output. Thank you for the help. I will report back later.

bwubb commented 10 months ago

Ugh. This was all user error on me. While Im still not certain what is causing the warning (the line number seems to change), but it is definitely not the reason my output values were appearing blank. Im sorry for the waste of time and I appreciate the help troubleshooting.