etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
520 stars 163 forks source link

pysam.utils.SamtoolsError: "samtools returned with error 2 #678

Closed philtat closed 2 years ago

philtat commented 2 years ago

Im running cnvkit for the first time. The test went fine. However, I am trying to run it on a WGS sample I have without a control using the following command:

cnvkit.py batch /media/phil/mySample.bam -n -m wgs -f /media/phil/hg38MainChroms.fa --annotate /media/phil/hg38.refGene.gtf

The pipeline generates a genome reference bed, a target bed, antitarget bed, reference cnn, and reference cnn1 files and then produces the following error:

Running the CNVkit pipeline on /media/phil/mySample.bam ... Processing reads in mySample.bam Traceback (most recent call last): File "/home/phil/cnvkit/cnvlib/coverage.py", line 199, in bedcov raw = pysam.bedcov(*cmd, split_lines=False) File "/home/phil/.local/lib/python3.9/site-packages/pysam/utils.py", line 69, in call raise SamtoolsError( pysam.utils.SamtoolsError: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t10000\t14941\tDDX11L1,WASH7P'\nErrors

and every subsequent entry in the reference bed is then flagged for the same error, so the actual error message is many 1000s of lines long. I have pysam 0.18 installed, and I am wondering if maybe the version is off or something? Help troubleshooting would be great.

tetedange13 commented 2 years ago

Hi @philtat ,

Not an author of CNVkit but I will try to help you ! => Thanks for your detailed issue, could you please also tell me:

My wild guess would be a chromosome naming mismatch between your BAM and genome fasta file given (which is used to build on-the-fly the "target" BED mentionned in your error) => If I guessed wrong, could you copy complete error message into a file and upload it here ?

Hope this helps ! Kind regards, Felix

philtat commented 2 years ago

Chromosome mismatched names was exactly the issue. Thank you for your insight.

tetedange13 commented 2 years ago

Glad it helped !

However if I may, something still looks strange to me because I am almost sure that this scenario should be handled better (I mean better than with this cryptic Samtools error you got) => You should not have 1000s of regions displayed plus at the end of error message (instead of "samtools returned with error 2") a more explicit :

ValueError: BED file './test.target.bed' chromosome names don't match 
any in BAM file 'na12878-chrM-Y-trunc.GRCh37.bam'

Based on the following snippet of code from coverage module: https://github.com/etal/cnvkit/blob/aff02f9f1f6386753cc61ba64d68a9e35e6aab93/cnvlib/coverage.py#L198-L205 => "Normal" (err well handled) scenario can be partly reproduced with CNVkit's test data: cnvkit.py batch -n -m wgs -f test.fa cnvkit/test/formats/na12878-chrM-Y-trunc.bam (where "test.fa" corresponds to "cnvkit/test/formats/chrM-Y-trunc.hg19.fa" but with chrom named without "chr" prefix)