etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
501 stars 163 forks source link

What are the specific steps to run for cnvkit.py batch -m wgs? #823

Open yangdingyangding opened 1 year ago

yangdingyangding commented 1 year ago

Hi,

Currently we're trying to use cnvkit on many WGBS samples (~100 tumor and ~100 normal samples), and found it very time-consuming to run in the batch mode even with --processes set with e.g., 32 or 64. In addition, it will crash sometimes on a single sample (see the error message below where it cannot find the existing, accessible bam file), and it seems that re-running cnvkit will not resume from the crashed intermediate results. Therefore, we'd like to find a more robust and efficient way of running cnvkit on this large sample set.

Traceback (most recent call last):
File "....../Python-3.11.2/lib/python3.11/site-packages/cnvlib/coverage.py", line 223, in bedcov
   raw = pysam.bedcov(*cmd, split_lines=False)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "....../Python-3.11.2/lib/python3.11/site-packages/pysam/utils.py", line 83, in __call__
   raise SamtoolsError(
pysam.utils.SamtoolsError: "samtools returned with error 2: stdout=, stderr=ERROR: fail to open index BAM file '...../path/to/bamfile.bam'\n"

Because we have access to a computing cluster, we'd like to speed up the computation by dispatching each of those single-sample-specific tasks to a separate node. However, we did not find the documentation of the specific steps cnvkit.py batch -m wgs will run (we only found a description for the default setting cnvkit.py batch as specified in https://cnvkit.readthedocs.io/en/stable/pipeline.html#batch ).

Could you help elaborate this for us? Thanks!