etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
501 stars 162 forks source link

cnvkit.py batch hangs on very large WGS dataset #867

Open gtollefson opened 4 months ago

gtollefson commented 4 months ago

Hi @etal,

I think I'm experiencing memory bottlenecking on a large sample set while generating the reference.cnn and am hoping you can help me to identify a way to move past it.

I'm running the following command to generate a reference.cnn for 5 normal samples. Each of the normal bam files are between 150-200GB (around 1Tb total). I'm curious whether the batch command attempts to open and run on all files simultaneously. I'm allocating my institution's maximum user memory allowance 1200GB for this job but it appears to hang indefinitely (over several days) and nohup doesn't show any CPU usage for this job).

cnvkit.py batch -n -m wgs -f <fasta_reference> --processes 40 -d output_male_new/ $sample_paths

Is there a way to generate the reference.cnn with all 5 of my normal samples by running each individually and then merging them?

Are there any other parameters you recommend setting for both creating the normal panel and also running the CNV calling on tumor samples using WGS bam files of this magnitude?

Thank you, George