CNVkit can't read its own target.bed during batch command

bozbezbozzel commented 1 year ago

CNVkit is able to read the .bed file with targets that I give it, it makes a target.bed, it processes my normal samples which appears to go well. Then I get the following error:

  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 199, in bedcov
    raw = pysam.bedcov(*cmd, split_lines=False)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t14620\t14712\t-'

followed by "Errors in BED line" for every single line in the file, followed by


Traceback (most recent call last):
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/batch.py", line 157, in batch_write_coverage
    cnarr = coverage.do_coverage(bed_fname, bam_fname, by_count, 0, processes, fasta)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 27, in do_coverage
    cnarr = interval_coverages(bed_fname, bam_fname, by_count, min_mapq,
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 57, in interval_coverages
    table = interval_coverages_pileup(bed_fname, bam_fname, min_mapq,
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 154, in interval_coverages_pileup
    table = bedcov(bed_fname, bam_fname, min_mapq, fasta)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 201, in bedcov
    raise ValueError("Failed processing %r coverages in %r regions. "
ValueError: Failed processing '/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/output/mapped/PD52898b_merged_hg38_clean_dedup_recal.bam' coverages in '/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/output/CNVkit/results_normal/targets_exome_agilent_v5_chromnum.target.bed' regions. PySAM error: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t14620\t14712\t-'\n

with again every.single.line.

I can run samtools bedcov just fine on my original bed file. CNVkit can read, process, and write out this target bed. So what's throwing me for a loop is that CNVkit cannot adequately process the file it wrote out itself.

What I've done:

used bed from a different source (USCS genomebrowser vs AstraZeneca github)
changed chromosome names in the bed file to match the chromosome names in my bams (removed "chr")

checked if the bed was purely tab-separated with cat -t

1^I65509^I65625^I.
1^I65831^I65973^I.
1^I69481^I69600^IOR4F5
1^I786001^I786139^IRP11-206L10.9
1^I786150^I786426^IRP11-206L10.9
1^I786471^I786562^IRP11-206L10.9
1^I817536^I817655^IFAM87B
1^I826715^I826895^ILINC00115
1^I826900^I827034^ILINC00115
1^I827040^I827185^ILINC00115

verified that my file had no carriage returns:

(snakemake) a.vliet@pasteur:/DATA/a.vliet/reference_files/cnvkit_beds$ cat -e targets_exome_agilent_v5_chromnum.bed | head
1   65509   65625   .$
1   65831   65973   .$
1   69481   69600   OR4F5$
1   786001  786139  RP11-206L10.9$
1   786150  786426  RP11-206L10.9$
1   786471  786562  RP11-206L10.9$
1   817536  817655  FAM87B$
1   826715  826895  LINC00115$
1   826900  827034  LINC00115$
1   827040  827185  LINC00115$

I'm out of ideas. I have no idea what's causing this-- surely if my original bed had something wrong with it, CNVkit would fail on the first step of trying to write out the target.bed file?

I'm running CNVkit 0.9.9 in python 3.9 with snakemake 7.8.5. pysam 0.19.1 and samtools 1.10.

bozbezbozzel commented 1 year ago

will try this suggestion: https://github.com/etal/cnvkit/issues/678 and update

bozbezbozzel commented 1 year ago

My reference genome uses 1, 2, 3 etc... for the chromosomes and so do my sample bams:

less Homo_sapiens.GRCh38.dna.primary_assembly.fa | grep ">"
>1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
>10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
>11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF

@HD     VN:1.6  GO:none SO:coordinate
@SQ     SN:1    LN:248956422    M5:2648ae1bacce4ec4b6cf337dcae37816  
@SQ     SN:10   LN:133797422    M5:907112d17fcb73bcab1ed1c72b97ce68  
@SQ     SN:11   LN:135086622    M5:1511375dc2dd1b633af8cf439ae90cec

I used the exact same reference genome file for every step and I'm coming from unmapped BAMs.

bozbezbozzel commented 1 year ago

Solved-- had to remove contigs from my antitarget and target input files.

JiayangZhou commented 1 year ago

got the same error, and my reference genome uses chr1, chr2..would you suggest I replace all with 1, 2, 3, and try again?

etal / cnvkit

CNVkit can't read its own target.bed during batch command #750