Open Ananya-swi opened 6 months ago
Hi Ananya,
Hope you are having a nice day. Sorry for the inconvenience.
Could you show the command that you are using to run VEP? Thank you.
Cheers, Nuno
Hi Nuno,
Thank you for your response.
We conducted testing using ARGO.
VEP Version Used: 111 & 109.3
Considering our dependency on AKS and ARGO, our capabilities are constrained to effectively utilizing 5 vCPUs and 25 GB of RAM for configurations with 8 vCPUs and 32 GB RAM, and 13 vCPUs and 55 GB of RAM for configurations with 16 vCPUs and 64 GB RAM.
Here is the command we are using to run VEP:
vep \
--cache --refseq \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --fork 5 --force_overwrite ;
In addition, we utilized nine custom files and the plugins pLI, CADD, and dbscSNV, for our annotation.
We encountered the same error with VEP versions 109.3 and 111, whereas VEP version 106.1 completed the annotation in 7 minutes for the same file.
Looking forward to your assistance. Thank you.
Best regards, Ananya
Hi @Ananya-swi,
Thanks for sending more information. You seem to be using VEP as expected, so I am not really sure why it is taking so much time.
Some ideas/questions about the performance issues:
Looking forward to your reply.
Best, Nuno
Hi @nuno-agostinho ,
Thanks for your response.
I've tried the method you suggested, removing the plugins, but I am still encountering the same issue. When I use --buffer_size 50 --fork 8
, the script runs but takes a long time to complete. The VCF file used as input only contains SNVs.
Explanation of Using --fork
Alone:
I used the VEP command without the --buffer_size flag. Here is the command:
vep \
--cache --refseq \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --fork 5 --force_overwrite ;
Using this command, I received the following error:
Using --buffer_size
When I added the --buffer_size 50 flag, the script ran but took a long time to execute. Here is the command I used:
vep \
--cache --refseq \
--CACHE_VERSION 109 \
--dir_plugins /opt/vep/.vep/Plugins \
--no_stats \
-i "/home/admin/test/Test.vcf.gz" \
-o "/home/admin/test/Test.txt" \
--symbol --hgvs --hgvsg --variant_class --gene_phenotype \
--flag_pick_allele_gene --canonical --appris --ccds --numbers --total_length --mane \
--sift p --polyphen p \
--fasta /opt/vep/.vep/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz \
--species homo_sapiens --assembly GRCh37 \
--af --af_gnomad \
--no_escape \
--plugin SpliceAI,snv=/opt/vep/.vep/Grch37/spliceai_scores.raw.snv.hg19.vcf.gz,indel=/opt/vep/.vep/Grch37/spliceai_scores.raw.indel.hg19.vcf.gz \
--plugin NMD \
--dir_plugins /opt/vep/.vep/Plugins \
--plugin dbNSFP,/opt/vep/.vep/Grch37/dbNSFP4.5a_grch37.gz,PROVEAN_pred,LRT_pred,MutationTaster_pred,\
MutationAssessor_pred,FATHMM_pred,fathmm-MKL_coding_pred,M-CAP_pred,fathmm-XF_coding_pred,\
DANN_score,MutPred_score,PrimateAI_pred,Aloft_pred,BayesDel_addAF_pred,LIST-S2_pred,\
MVP_score,Eigen-phred_coding,SiPhy_29way_logOdds,bStatistic,Interpro_domain,MetaLR_pred,\
GTEx_V8_gene,GTEx_V8_tissue,VEST4_score,REVEL_score,AlphaMissense_score \
--offline --tab --buffer_size 50 --fork 5 --force_overwrite ;
Despite following the suggestions, the issue persists. The script runs with a smaller buffer size but takes a significantly longer time to complete. It appears that higher buffer sizes and fork counts lead to process communication issues.
I tried these commands with VEP versions 111 and 109.3, and the same error occurs. However, when using versions 106.0 or 106.1, it works without any issues.
Could you provide further insights or additional configurations that might help resolve this problem?
I look forward to your guidance on this issue.
Thanks, Ananya
Hi,
We are currently using the VEP v111 Docker container to annotate VCF files. However, we are facing the following issues:
Previously we were using v106.1. Which is taking only 7 minutes to complete the same file. We also tried with v109.3, which leads to same error.
We would appreciate any guidance or suggestions to resolve these issues. Looking forward to your reply. Thank you in advance for your assistance.
Regards, Ananya Saji Data Engineer (Bioinformatics) Semantic Web Tech Pvt. Ltd.