Currently we're trying to use cnvkit on many WGBS samples (~100 tumor and ~100 normal samples), and found it very time-consuming to run in the batch mode even with --processes set with e.g., 32 or 64. In addition, it will crash sometimes on a single sample (see the error message below where it cannot find the existing, accessible bam file), and it seems that re-running cnvkit will not resume from the crashed intermediate results. Therefore, we'd like to find a more robust and efficient way of running cnvkit on this large sample set.
Traceback (most recent call last):
File "....../Python-3.11.2/lib/python3.11/site-packages/cnvlib/coverage.py", line 223, in bedcov
raw = pysam.bedcov(*cmd, split_lines=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "....../Python-3.11.2/lib/python3.11/site-packages/pysam/utils.py", line 83, in __call__
raise SamtoolsError(
pysam.utils.SamtoolsError: "samtools returned with error 2: stdout=, stderr=ERROR: fail to open index BAM file '...../path/to/bamfile.bam'\n"
Because we have access to a computing cluster, we'd like to speed up the computation by dispatching each of those single-sample-specific tasks to a separate node. However, we did not find the documentation of the specific steps cnvkit.py batch -m wgs will run (we only found a description for the default setting cnvkit.py batch as specified in https://cnvkit.readthedocs.io/en/stable/pipeline.html#batch ).
Hi,
Currently we're trying to use cnvkit on many WGBS samples (~100 tumor and ~100 normal samples), and found it very time-consuming to run in the
batch
mode even with--processes
set with e.g., 32 or 64. In addition, it will crash sometimes on a single sample (see the error message below where it cannot find the existing, accessible bam file), and it seems that re-running cnvkit will not resume from the crashed intermediate results. Therefore, we'd like to find a more robust and efficient way of running cnvkit on this large sample set.Because we have access to a computing cluster, we'd like to speed up the computation by dispatching each of those single-sample-specific tasks to a separate node. However, we did not find the documentation of the specific steps
cnvkit.py batch -m wgs
will run (we only found a description for the default settingcnvkit.py batch
as specified in https://cnvkit.readthedocs.io/en/stable/pipeline.html#batch ).Could you help elaborate this for us? Thanks!