Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
141 stars 117 forks source link

no results from mutfunc plugin #727

Closed mathob closed 4 months ago

mathob commented 4 months ago

I am running vep with a singularity image built from docker://ensemblorg/ensembl-vep:release_112.0. Using this image I have set up multiple plugins which work as advertised. However the mutfunc plugin does not seem to be running properly. I do not get any mutfunc annotations i.e although the tsv output file has header lines like this:

## mutfunc_exp : Impact on protein structure (experimental). (ddG >= 2 deleterious)
## mutfunc_int : Impact on protein interaction interface. (ddG >= 2 deleterious)
## mutfunc_mod : Impact on protein structure. (ddG >= 2 deleterious)
## mutfunc_motif : Impact on linear motif. (1 = lost)

none of these fields appear on any result lines. This is the case when I use as input the little GRCh38 test input file (/opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf in the container) and also when I try other bigger input VCF files of my own.

I am using singularity-ce version 3.11.3 on a linux os.

The sqlite file (mutfunc_data.db) md5sum check is OK.

An example command is:

singularity exec \
--bind ${VEPDIR}/cache:/cache \
--bind ${VEPDIR}/plugin_data:/plugin_data \
${VEPDIR}/112.0/ensembl-vep.sif vep \
-dir /cache \
-assembly GRCh38 \
--plugin mutfunc,db=/plugin_data/mutfunc_data.db \
--cache \
--offline \
-i /opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf \
-o mutfunc_test.GRCh38.vep.txt

This runs without errors or warnings.

My questions:

and my supplementary question (which is unimportant, just curiosity):

sqlite> select * from consequences limit 1; homo_sapiens|984379124b89450a146ca532f7709414|motif|�

Thanks

Matthew

nakib103 commented 4 months ago

Hi @mathob,

Thanks for your query and using the mutfunc plugin!

Unfortunately, I cannot reproduce the issue you are having. If I run the homo_sapiens_GRCh38.vcf input file against the mutfunc plugin I can get results, for example -

rs116555717 22:39231644 C   ENSG00000100311 ENST00000331163 Transcript  missense_variant    1453    434 145 Q/R cAg/cGg -   IMPACT=MODERATE;STRAND=-1;mutfunc_exp=-0.30581;mutfunc_mod=-0.37622

With container, the plugin should work the same way and the GRCh38 assembly is correct. Can you check this specific variant in your output file and let me know if you see mutfunc output against it?

About exploring data in mutfunc DB

If you check the table info you will see it has 4 fields -

sqlite> pragma table_info([consequences]);
0|species||0||0
1|md5||0||0
2|item||0||0
3|matrix||0||0

So, if you want to check motif analysis value stored in the specific record in the table you can grab the record related to that from the table, uncompress the matrix, and parse it to check specific value. We have 3 functions in the plugin to do that which you can emulate -

  1. expand_matrix: uncompress the matrix for a record in the database
  2. retrieve_item_value: retrieve the entry stored in the matrix for a specific position and amino acid. The value of the $tot_packed_len should be 26 for motif, 42 for int, and 40 for mod and exp.
  3. parse_destabilizers: parse the entry you retrieved above and give individual values that we store respective to each item.

Hope that helps.

Best regards, Nakib

mathob commented 4 months ago

Thanks for your prompt reply.

I find that the plugin runs as expected only if "--offline" is not specified. From information on this help page (https://asia.ensembl.org/info/docs/tools/vep/script/vep_cache.html#offline) I had expected to be able to run plugins in offline mode but perhaps mutfunc is an exception.

Matthew

nakib103 commented 4 months ago

It should work with or without --offline. Can you give me an example of a variant line from the output that is working without --offline and same variant line from the output when --offline is used?

mathob commented 4 months ago

The same variant that you mentioned previously can be used as an example.

With --offline:

singularity exec \
 --bind ${VEPDIR}/cache:/cache \
 --bind ${VEPDIR}/plugin_data:/plugin_data \
 ${VEPDIR}/112.0/ensembl-vep.sif vep \
 -dir /cache \
 -assembly GRCh38 \
 --plugin mutfunc,db=/plugin_data/mutfunc_data.db \
 --cache \
 --offline \
 -i /opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf \
 -o mutfunc.offline.vep.txt

grep rs116555717 mutfunc.offline.vep.txt
rs116555717 22:39231644 C   ENSG00000100311 ENST00000331163 Transcript  missense_variant    1453    434 145 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1
rs116555717 22:39231644 C   ENSG00000100311 ENST00000381551 Transcript  missense_variant    426 389 130 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1
rs116555717 22:39231644 C   ENSG00000100311 ENST00000440375 Transcript  missense_variant    489 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
rs116555717 22:39231644 C   ENSG00000100311 ENST00000455790 Transcript  missense_variant    462 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF

and without:

singularity exec \
 --bind ${VEPDIR}/cache:/cache \
 --bind ${VEPDIR}/plugin_data:/plugin_data \
 ${VEPDIR}/112.0/ensembl-vep.sif vep \
 -dir /cache \
 -assembly GRCh38 \
 --plugin mutfunc,db=/plugin_data/mutfunc_data.db \
 --cache \
 -i /opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf \
 -o mutfunc.online.vep.txt

grep rs116555717 mutfunc.online.vep.txt
rs116555717 22:39231644 C   ENSG00000100311 ENST00000331163 Transcript  missense_variant    1453    434 145 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;mutfunc_exp=-0.30581;mutfunc_mod=-0.37622
rs116555717 22:39231644 C   ENSG00000100311 ENST00000381551 Transcript  missense_variant    426 389 130 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;mutfunc_exp=5.59046;mutfunc_int=-0.39519;mutfunc_mod=7.36878
rs116555717 22:39231644 C   ENSG00000100311 ENST00000440375 Transcript  missense_variant    489 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
rs116555717 22:39231644 C   ENSG00000100311 ENST00000455790 Transcript  missense_variant    462 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
mathob commented 4 months ago

That second command (without --offline) ran tenfold slower that the first which makes me wonder how the online ensembl database is being used. When --offline is not specified does that mean --cache is disregarded? That is not my understanding of what is said here: https://asia.ensembl.org/info/docs/tools/vep/script/vep_cache.html#limitations

nakib103 commented 4 months ago

I was able to re-generate the issue you are having. The mutfunc plugin requires translation sequence which is not stored in cache (check the url you have mentioned above). When you are using --offline it only uses cache and no database connection can be made to retrieve the required sequence. To resolve this either you do not use --offline as you were doing (which is slower as you mentioned) or better, provide FASTA file --fasta for human.

Slower run without --offline

As mentioned above when you use --offline it tells VEP to not use the Ensembl database at all. Generally, database connections are slower. It does not mean it disregard cache altogether but tries to get information like translation sequence from database.

Hope that answers your question.

mathob commented 4 months ago

OK, that has answered my question, and I can run the plugin successfully with --offline now.

Perhaps it wouldn't hurt to update the plugin help string to advise that "--fasta" is required in offline mode.

Thanks very much for your help.

nakib103 commented 4 months ago

Glad to know it has worked for you. I will add a warning in the plugin explaining such case.

I will close this issue. If you face any further issue in future please feel free to open a new one.