Closed bb-so closed 2 years ago
Hi,
Thanks for your query. I’ve had a little look at this issue and I can only reproduce it when I set the number of forks to be greater than the number of cores on the machine. I have a couple of thoughts:
1) How many cores are on the machine that you’re using? If you’re using —fork 4, then this is unlikely to be an issue, but maybe. Does it crash if you use —fork 2?
2) Could you please send me/link me to the bed file you’re using for your custom annotations? Just to be sure, if you run it without the ACMG-SF.bed.gz custom annotation, it runs happily, but if you remove the other 3 and only run with this bed file, it crashes?
Thanks for your help with this.
Kind Regards, Andrew
On 8 Aug 2018, at 10:03, bb-so notifications@github.com wrote:
Hi,
I am running the following command using docker (ensemblorg/ensembl-vep:release_93): vep -i input.vcf.gz -o output.ann.vcf --fasta genome.fa --format vcf --cache --dir_cache /path/to --cache_version 93 --offline --fork 4 --refseq --exclude_predicted --vcf_info_field ANN --vcf --pick --pick_order rank,biotype,ccds,canonical,appris,tsl,length --hgvs --symbol --biotype --custom 1000g.vcf.gz,1000G,vcf,exact,0,AF --custom gnomadWGS.vcf.gz,gnomadWGS,vcf,exact,0,AF --custom gnomadWES.vcf.gz,gnomadWES,vcf,exact,0,AF --custom ClinVar.vcf.gz,ClinVar,vcf,exact,0,CLNSIG --custom ACMG-SF.bed.gz,ACMG-SF,bed,exact --plugin dbNSFP,dbNSFP.txt.gz,GERP++_RS --fields SYMBOL,BIOTYPE,HGVSc,HGVSp,IMPACT,Consequence,1000G_AF,gnomadWGS_AF,gnomadWES_AF,ClinVar_CLNSIG,ACMG-SF,GERP++_RS
but getting the same error reported in #150 https://github.com/Ensembl/ensembl-vep/issues/150 `-------------------- EXCEPTION -------------------- MSG: ERROR: Forked process(es) died: read-through of cross-process communication detected
STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /opt/vep/src/ensembl-> vep/modules/Bio/EnsEMBL/VEP/Runner.pm:554 STACK Bio::EnsEMBL::VEP::Runner::next_output_line /opt/vep/src/ensembl-> vep/modules/Bio/EnsEMBL/VEP/Runner.pm:361 STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-> vep/modules/Bio/EnsEMBL/VEP/Runner.pm:202 STACK toplevel /opt/vep/src/ensembl-vep/vep:224 Date (localtime) = Wed Aug 8 08:20:01 2018 Ensembl API version = 93 ---------------------------------------------------`
The issue disappears in two cases:
I remove--fork, disabling forking (too slow for my use case); I remove the --custom annotation that uses a BED file.
Note that:
the exact same command works with previous versions of vep (tested with: "ensemblorg/ensembl-vep:release_89.5" in the same machine); I have already tried using low --fork values , but that did not solve the issue; I have already tried setting --buffer_size 50 as suggested in #150 https://github.com/Ensembl/ensembl-vep/issues/150, but that did not solve the issue. Any suggestion?
Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Ensembl/ensembl-vep/issues/256, or mute the thread https://github.com/notifications/unsubscribe-auth/AfvjN68aC4zrtj23DksAlwV6c7jAFAheks5uOqlVgaJpZM4Vzi9M.
Hi Andrew,
thank you for the swift reply.
1) The machine I am using has 48 cores and 128 GB of memory. Yes, it crashes with --fork 2
as well.
2) ok, I will send you the BED file. Correct: it works well if I remove the BED annotation, but the BED annotation it's enough to cause a crash. I could reproduce it with the following:
vep -i input.vcf.gz -o output.vcf --fork 4 --format vcf --cache --dir_cache /path/to --cache_version 93 --offline --refseq --use_given_ref --vcf_info_field ANN --custom ACMG-SF.bed.gz,ACMG-SF,bed
Best,
Simone
Hi Simone,
Just to let you know, we're still looking into this issue. I'll let you know when we come up with anything more concrete.
Kind Regards, Andrew
Thanks Andrew, much appreciated! Best regards,
Simone
From: Andrew Parton notifications@github.com Reply-To: Ensembl/ensembl-vep reply@reply.github.com Date: Tuesday, 4 September 2018 at 12:23 To: Ensembl/ensembl-vep ensembl-vep@noreply.github.com Cc: Simone Olgiati simone.olgiati@bluebee.com, Author author@noreply.github.com Subject: Re: [Ensembl/ensembl-vep] ERROR: --fork and --custom BED annotation (#256)
Hi Simone,
Just to let you know, we're still looking into this issue. I'll let you know when we come up with anything more concrete.
Kind Regards, Andrew
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Ensembl/ensembl-vep/issues/256#issuecomment-418316969, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AoP2bT6Cs5JSRXmJ52iTyxCLA7qtneK6ks5uXlSVgaJpZM4Vzi9M.
any news on this issue? we are experiencing this twice a month while running VEP in AWS Batch
Hi @007vasy,
Unfortunately we have nothing concrete to report yet on fixing this issue. The only time we're able to reproduce this issue is when VEP trying to setup more forks than there are cores on the machine.
Could you please tell us more about how you're running VEP? What input command are you using? How many forks are you requesting on which type of AWS infrastructure? There may be some small changes we can make to minimise the chances of this happening.
Kind Regards, Andrew
Hello @aparton Thanks for responding. I will provide the context to @007vasy questions.
We run the vep
command using Esembl docker image via cromwell workflow engine using AWS Batch as compute backend and querying our own mirror VEP database running as a AWS RDS service. The commandline is as follow.
tar -xf ${vep_plugins_tar} -C /opt/vep/.vep -P
vep --database \
--host ${db_host} \
--user ${db_user} \
--password ${db_password} \
--port ${db_port} \
--species homo_sapiens \
--assembly GRCh${reference_assembly} \
--fasta ${fasta} \
-i ${input_vcf_file} \
-o ${vep_output_filename} \
--plugin MaxEntScan,/opt/vep/.vep/vep_data/Plugins_data/MaxEntScan_data/,SWA,NCSS \
--plugin dbscSNV,/opt/vep/.vep/vep_data/Plugins_data/dbscSNV_data/dbscSNV1.1_GRCh${reference_assembly}.txt.gz \
--fork ${cpu} \
--dir_plugins /opt/vep/.vep/vep_data/Plugins/ \
--vcf \
--no_stats \
--buffer_size ${buffer_size} \
--allele_number \
--fields "Allele,ALLELE_NUM,Feature,Consequence,MaxEntScan_diff,MES-SWA_acceptor_diff,MES-SWA_donor_diff,MaxEntScan_alt,MaxEntScan_ref,MES-SWA_acceptor_alt,MES-SWA_donor_alt,MES-SWA_acceptor_ref,MES-SWA_donor_ref,MES-SWA_acceptor_ref_comp,MES-SWA_donor_ref_comp,MES-NCSS_downstream_acceptor,MES-NCSS_downstream_donor,MES-NCSS_upstream_acceptor,MES-NCSS_upstream_donor,ada_score,rf_score"
The error occur only once in the few months after the system is productionised and we haven't been able to reproduce it again. That error also cascade outside of the vep
unix process itself and disabled AWS Batch own ability to stop the docker container and also the ability to kill the batch job. I am not sure if this extra information would help with pinning down the issue.
At the time the error occur I believe the parameter were cpu = 4
, i.e. we requested 4 cpu from AWS Batch which is also the number fork vep process specified on the commandline.
Docker image: ensemblorg/ensembl-vep@sha256:0d3ff994142cf9a2a1c7be8ae4f2b96227848ea470e3f4753bd0be3a1fed5ceb
VEP version: 98
Thanks for your help! :)
Hi @edmundlth,
Thanks for providing this information. My AWS knowledge is slim but workable, so please correct me if I'm wrong, however I think your experience of this bug fits with what we already know about this issue.
I think this is related to the docker image and the AWS setup and is independent of the Cromwell workflow system. We are able to reproduce this issue when we ask VEP to fork into more CPUs than there are available on the host machine.
I assume that you're using the default VEP docker image, and you haven't made any edits to this? Multi-threading within VEP uses the perlfork functionality as described here: https://perldoc.perl.org/functions/fork.html - which has its own collection of limitations, however within a docker image then it should default to being able to use all available CPUs on a particular machine. So as long as AWS batch has given you a machine with 4 CPUs, then this should work smoothly.
However, when you say that this error disabled AWS Batch's ability to stop the container and kill the batch job, how exactly did this error manifest itself? I'm speculating a little here, however if AWS Batch was unable to provide you with sufficient CPU resources, then I believe that the AWS job would have sat with RUNNABLE status, which would have led to all of the issues that you describe.
Either way, unfortunately I don't have any great suggestions for preventing this from happening again with your current setup. While I suspect your one-time error is due to an issue in allocating AWS resources, I may be entirely off base with this. If you manage to reproduce it more reliably or have any other information you can provide related to this issue, I'm happy to take a closer look.
Kind Regards, Andrew
Hi @bb-so , @edmundlth and @007vasy
Because of the age of this ticket and also because we are not able to reliably reproduce the issue internally, I am going to close off this ticket. However, if you have updates or are still encountering the issues please feel free to reopen, or open a new ticket and we can review.
Cheers, Jamie.
Hi,
I am running the following command using docker (ensemblorg/ensembl-vep:release_93):
vep -i input.vcf.gz -o output.ann.vcf --fasta genome.fa --format vcf --cache --dir_cache /path/to --cache_version 93 --offline --fork 4 --refseq --exclude_predicted --vcf_info_field ANN --vcf --pick --pick_order rank,biotype,ccds,canonical,appris,tsl,length --hgvs --symbol --biotype --custom 1000g.vcf.gz,1000G,vcf,exact,0,AF --custom gnomadWGS.vcf.gz,gnomadWGS,vcf,exact,0,AF --custom gnomadWES.vcf.gz,gnomadWES,vcf,exact,0,AF --custom ClinVar.vcf.gz,ClinVar,vcf,exact,0,CLNSIG --custom ACMG-SF.bed.gz,ACMG-SF,bed,exact --plugin dbNSFP,dbNSFP.txt.gz,GERP++_RS --fields SYMBOL,BIOTYPE,HGVSc,HGVSp,IMPACT,Consequence,1000G_AF,gnomadWGS_AF,gnomadWES_AF,ClinVar_CLNSIG,ACMG-SF,GERP++_RS
but getting the same error reported in #150
`-------------------- EXCEPTION -------------------- MSG: ERROR: Forked process(es) died: read-through of cross-process communication detected
STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output /opt/vep/src/ensembl-> vep/modules/Bio/EnsEMBL/VEP/Runner.pm:554 STACK Bio::EnsEMBL::VEP::Runner::next_output_line /opt/vep/src/ensembl-> vep/modules/Bio/EnsEMBL/VEP/Runner.pm:361 STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-> vep/modules/Bio/EnsEMBL/VEP/Runner.pm:202 STACK toplevel /opt/vep/src/ensembl-vep/vep:224 Date (localtime) = Wed Aug 8 08:20:01 2018 Ensembl API version = 93 ---------------------------------------------------`
The issue disappears in two cases:
I remove
--fork
, disabling forking (too slow for my use case); I remove the--custom
annotation that uses a BED file.Note that:
--fork
values , but that did not solve the issue;--buffer_size 50
as suggested in #150, but that did not solve the issue.Any suggestion?
Thank you!