Closed matthdsm closed 7 years ago
Hi Matthias,
Which pipeline are you seeing this behavior?
Just the default "variant2" pipeline. We're using v1.0.2 with the following config
#bcbio-nextgen v1.0.2
---
#include an experiment name here
fc_name:
upload:
dir: ../final
globals:
analysis_regions: RefSeq_allexons_20bp.sorted.merged.bed
resources:
tmp:
dir: /tmp/bcbio
details:
- analysis: variant2
genome_build: hg38
description:
metadata:
batch:
algorithm:
aligner: bwa
save_diskspace: true
coverage_interval: regional
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: gatk-haplotype
variant_regions: analysis_regions
jointcaller: gatk-haplotype-joint
effects: vep
effects_transcripts: all
vcfanno: [gemini,../config/eog.conf,../config/jpopgen.conf]
tools_on:
- vep_splicesite_annotations
# add the path to your files here
files:
Thanks for looking into this. M
Matthias; Thanks for reporting the issue and for the details. How many samples are you running concurrently? Memory usage will be dependent on that since bcbio builds record objects to pass for parallelization. During highly parallel steps like variant calling this can be a lot of objects and the memory usage can get high. Could this explain what you're seeing?
Apologies, I know this isn't ideal for continuing to scale up. This is one of the motivations for moving to CWL where we can use more scalable infrastructures for handling these sorts of issues.
Hi Brad, This is a run containing 120 exomes (first time we're doing a run this big). During our previous runs (about 48 samples) we've had no issues, so it can very well be the number of samples that's throwing things off.
It's not much of a problem now, but good thing to know for future use. If we have some bigger sample sets, we can batch them. I don't know if this limitation is mentioned somewhere in the docs, but it might be worth it to add.
Thanks for looking in to it! M
Hi Brad,
Just a little follow up on this. Suppose we're looking into transitioning to CWL, which runner would you advise? We have a cluster running torque/PBS, so Arvados is out of the question, and Toil doensn't support torque. Any other proposition?
Thanks M
Matthias; We're currently working on torque/PBS support for Toil but it is not quite there yet. Apologies, the CWL work is still under active development and not quite ready for production use right now. Thanks for checking in on it and we'll keep moving things forward.
Hi Brad, no problem, I just wanted to know how things stand as of now. I'll close this issue, since the problem was clearly caused by the number of samples.
Cheers, M
Hi Brad,
We're experiencing a somewhat weird issue. We run bcbio with iPython on a torque cluster, where the bcbio_nextgen.py command is run as a single core job, which then spawns the iPython jobs. This "master" job seems to be using an excessive amount of memory, which keep rising untill the worker node runs out and the job is killed.
Any idea what could be causing this? Currently, the memory usage is stable at about 8Gb, which seems to be a tad too much I think.
Thanks M