etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
502 stars 163 forks source link

CNVkit can't read its own target.bed during batch command #750

Closed bozbezbozzel closed 1 year ago

bozbezbozzel commented 1 year ago

CNVkit is able to read the .bed file with targets that I give it, it makes a target.bed, it processes my normal samples which appears to go well. Then I get the following error:

  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 199, in bedcov
    raw = pysam.bedcov(*cmd, split_lines=False)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t14620\t14712\t-'

followed by "Errors in BED line" for every single line in the file, followed by


Traceback (most recent call last):
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/batch.py", line 157, in batch_write_coverage
    cnarr = coverage.do_coverage(bed_fname, bam_fname, by_count, 0, processes, fasta)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 27, in do_coverage
    cnarr = interval_coverages(bed_fname, bam_fname, by_count, min_mapq,
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 57, in interval_coverages
    table = interval_coverages_pileup(bed_fname, bam_fname, min_mapq,
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 154, in interval_coverages_pileup
    table = bedcov(bed_fname, bam_fname, min_mapq, fasta)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 201, in bedcov
    raise ValueError("Failed processing %r coverages in %r regions. "
ValueError: Failed processing '/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/output/mapped/PD52898b_merged_hg38_clean_dedup_recal.bam' coverages in '/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/output/CNVkit/results_normal/targets_exome_agilent_v5_chromnum.target.bed' regions. PySAM error: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t14620\t14712\t-'\n

with again every.single.line.

I can run samtools bedcov just fine on my original bed file. CNVkit can read, process, and write out this target bed. So what's throwing me for a loop is that CNVkit cannot adequately process the file it wrote out itself.

What I've done:

I'm out of ideas. I have no idea what's causing this-- surely if my original bed had something wrong with it, CNVkit would fail on the first step of trying to write out the target.bed file?

I'm running CNVkit 0.9.9 in python 3.9 with snakemake 7.8.5. pysam 0.19.1 and samtools 1.10.

bozbezbozzel commented 1 year ago

will try this suggestion: https://github.com/etal/cnvkit/issues/678 and update

bozbezbozzel commented 1 year ago

My reference genome uses 1, 2, 3 etc... for the chromosomes and so do my sample bams:

less Homo_sapiens.GRCh38.dna.primary_assembly.fa | grep ">"
>1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
>10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
>11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF
@HD     VN:1.6  GO:none SO:coordinate
@SQ     SN:1    LN:248956422    M5:2648ae1bacce4ec4b6cf337dcae37816  
@SQ     SN:10   LN:133797422    M5:907112d17fcb73bcab1ed1c72b97ce68  
@SQ     SN:11   LN:135086622    M5:1511375dc2dd1b633af8cf439ae90cec   

I used the exact same reference genome file for every step and I'm coming from unmapped BAMs.

bozbezbozzel commented 1 year ago

Solved-- had to remove contigs from my antitarget and target input files.

JiayangZhou commented 1 year ago

got the same error, and my reference genome uses chr1, chr2..would you suggest I replace all with 1, 2, 3, and try again?