Closed mathob closed 4 months ago
Hi @mathob,
Thanks for your query and using the mutfunc plugin!
Unfortunately, I cannot reproduce the issue you are having. If I run the homo_sapiens_GRCh38.vcf
input file against the mutfunc plugin I can get results, for example -
rs116555717 22:39231644 C ENSG00000100311 ENST00000331163 Transcript missense_variant 1453 434 145 Q/R cAg/cGg - IMPACT=MODERATE;STRAND=-1;mutfunc_exp=-0.30581;mutfunc_mod=-0.37622
With container, the plugin should work the same way and the GRCh38 assembly is correct. Can you check this specific variant in your output file and let me know if you see mutfunc output against it?
If you check the table info you will see it has 4 fields -
sqlite> pragma table_info([consequences]);
0|species||0||0
1|md5||0||0
2|item||0||0
3|matrix||0||0
motif
, int
, mod
, exp
So, if you want to check motif analysis value stored in the specific record in the table you can grab the record related to that from the table, uncompress the matrix, and parse it to check specific value. We have 3 functions in the plugin to do that which you can emulate -
Hope that helps.
Best regards, Nakib
Thanks for your prompt reply.
I find that the plugin runs as expected only if "--offline" is not specified. From information on this help page (https://asia.ensembl.org/info/docs/tools/vep/script/vep_cache.html#offline) I had expected to be able to run plugins in offline mode but perhaps mutfunc is an exception.
Matthew
It should work with or without --offline
. Can you give me an example of a variant line from the output that is working without --offline
and same variant line from the output when --offline
is used?
The same variant that you mentioned previously can be used as an example.
With --offline:
singularity exec \
--bind ${VEPDIR}/cache:/cache \
--bind ${VEPDIR}/plugin_data:/plugin_data \
${VEPDIR}/112.0/ensembl-vep.sif vep \
-dir /cache \
-assembly GRCh38 \
--plugin mutfunc,db=/plugin_data/mutfunc_data.db \
--cache \
--offline \
-i /opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf \
-o mutfunc.offline.vep.txt
grep rs116555717 mutfunc.offline.vep.txt
rs116555717 22:39231644 C ENSG00000100311 ENST00000331163 Transcript missense_variant 1453 434 145 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1
rs116555717 22:39231644 C ENSG00000100311 ENST00000381551 Transcript missense_variant 426 389 130 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1
rs116555717 22:39231644 C ENSG00000100311 ENST00000440375 Transcript missense_variant 489 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
rs116555717 22:39231644 C ENSG00000100311 ENST00000455790 Transcript missense_variant 462 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
and without:
singularity exec \
--bind ${VEPDIR}/cache:/cache \
--bind ${VEPDIR}/plugin_data:/plugin_data \
${VEPDIR}/112.0/ensembl-vep.sif vep \
-dir /cache \
-assembly GRCh38 \
--plugin mutfunc,db=/plugin_data/mutfunc_data.db \
--cache \
-i /opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf \
-o mutfunc.online.vep.txt
grep rs116555717 mutfunc.online.vep.txt
rs116555717 22:39231644 C ENSG00000100311 ENST00000331163 Transcript missense_variant 1453 434 145 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;mutfunc_exp=-0.30581;mutfunc_mod=-0.37622
rs116555717 22:39231644 C ENSG00000100311 ENST00000381551 Transcript missense_variant 426 389 130 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;mutfunc_exp=5.59046;mutfunc_int=-0.39519;mutfunc_mod=7.36878
rs116555717 22:39231644 C ENSG00000100311 ENST00000440375 Transcript missense_variant 489 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
rs116555717 22:39231644 C ENSG00000100311 ENST00000455790 Transcript missense_variant 462 341 114 Q/R cAg/cGg -IMPACT=MODERATE;STRAND=-1;FLAGS=cds_end_NF
That second command (without --offline) ran tenfold slower that the first which makes me wonder how the online ensembl database is being used. When --offline is not specified does that mean --cache is disregarded? That is not my understanding of what is said here: https://asia.ensembl.org/info/docs/tools/vep/script/vep_cache.html#limitations
I was able to re-generate the issue you are having. The mutfunc plugin requires translation sequence which is not stored in cache (check the url you have mentioned above). When you are using --offline
it only uses cache and no database connection can be made to retrieve the required sequence. To resolve this either you do not use --offline
as you were doing (which is slower as you mentioned) or better, provide FASTA file --fasta for human.
--offline
As mentioned above when you use --offline
it tells VEP to not use the Ensembl database at all. Generally, database connections are slower. It does not mean it disregard cache altogether but tries to get information like translation sequence from database.
Hope that answers your question.
OK, that has answered my question, and I can run the plugin successfully with --offline now.
Perhaps it wouldn't hurt to update the plugin help string to advise that "--fasta" is required in offline mode.
Thanks very much for your help.
Glad to know it has worked for you. I will add a warning in the plugin explaining such case.
I will close this issue. If you face any further issue in future please feel free to open a new one.
I am running vep with a singularity image built from docker://ensemblorg/ensembl-vep:release_112.0. Using this image I have set up multiple plugins which work as advertised. However the mutfunc plugin does not seem to be running properly. I do not get any mutfunc annotations i.e although the tsv output file has header lines like this:
none of these fields appear on any result lines. This is the case when I use as input the little GRCh38 test input file (/opt/vep/src/ensembl-vep/examples/homo_sapiens_GRCh38.vcf in the container) and also when I try other bigger input VCF files of my own.
I am using singularity-ce version 3.11.3 on a linux os.
The sqlite file (mutfunc_data.db) md5sum check is OK.
An example command is:
This runs without errors or warnings.
My questions:
and my supplementary question (which is unimportant, just curiosity):
sqlite> select * from consequences limit 1; homo_sapiens|984379124b89450a146ca532f7709414|motif|�
Thanks
Matthew