etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
548 stars 166 forks source link

cnvkit batch fail for WES data #436

Open flower1996 opened 5 years ago

flower1996 commented 5 years ago

Dear, I am running CNVkit to call cnvs on samples sequenced with WES sequencing, and get an error below:

Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes Traceback (most recent call last): File "/home/miniconda3/bin/cnvkit.py", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/home/biosoftware/cnvkit/cnvkit/cnvkit.py", line 9, in <module> args.func(args) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/commands.py", line 143, in _cmd_batch args.cluster) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/parallel.py", line 19, in submit return SerialFuture(func(*args)) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/batch.py", line 192, in batch_run_sample else {})) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/segmentation/__init__.py", line 66, in do_segmentation for _, ca in cnarr.by_arm()))) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/segmentation/__init__.py", line 91, in _ds return _do_segmentation(*args) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/segmentation/__init__.py", line 162, in _do_segmentation seg_out = core.call_quiet(rscript_path, '--vanilla', script_fname) File "/home/biosoftware/cnvkit/cnvkit/cnvlib/core.py", line 32, in call_quiet % (' '.join(args), err)) RuntimeError: Subprocess command failed: $ Rscript --vanilla /tmp/tmp7lkqejav b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio", :\n markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n length of weights should be the same as the number of probes\n\xe5\x81\x9c\xe6\xad\xa2\xe6\x89\xa7\xe8\xa1\x8c\n

The command that I run is: cnvkit.py batch tumor.bam --normal normal.bam \ --targets hg38.exon.bed \ --method amplicon \ --annotate refFlat.txt \ --fasta Homo_sapiens_assembly38.fasta \ --access hg38.exon.bed \ --output-reference my_reference.cnn --output-dir /CNV \ --diagram --scatter.

Any Ideas what is going on? Thanks!

etal commented 5 years ago

It looks like you had some NaN-valued weights, or maybe log2 values. Which version of CNVkit are you using? If it's a very recent development version, there could have been a temporary quirk that may be fixed if you pull a fresh copy.

flower1996 commented 5 years ago

I have run this in cnvkit version CNVkit 0.9.7.dev0.

quentinmiagoux commented 4 years ago

I have a very similar error when using CNVkit 0.9.7.b1 which was reported by another user on biostars : https://www.biostars.org/p/415994/

zhangyimin40 commented 4 years ago

I have met with the same problem with the following command: cnvkit.py batch S117.chr1.bam --normal S117F.chr1.bam \ --targets Genome.bed --annotate refFlat.txt \ --fasta hg19.fa --access Genome.bed \ --output-reference my_reference.cnn --output-dir S117_vs_S117F \ --diagram --scatter -m wgs

The same command could run correctly in CNVkit v 0.9.0. To test whether CNVkit 0.9.7 was installed correctly, I ran the makefile in the test directory of CNVkit and things went well. I would appreciate it if you have a solution to this problem.

zhangyimin40 commented 4 years ago

I noticed that it was segmenting problem, so I specified "--segment-method hmm" instead of using the default method "cbs" and ran the batch command successfully. The cbs method depends on R package "DNAcopy", I guess there are some problems when it read the input table. --segment-method also has options including flasso that depends on R package "cghFLasso"; this package now is not available in CRAN. "hmm" method runs fast and depends on Python package hmmlearn. It could be an alternative of cbs.

etal commented 4 years ago

Thanks for the details. Are you able to see if any of the input .cnr files contained NaN values? The test files bundled with CNVkit do not have NaNs, but if NaNs are appearing in the .cnr files in practice (either log2 or weight columns) then that would explain the issue.

etal commented 3 years ago

I've merged a PR that should fix this issue. Could anyone try rerunning with the latest development version of CNVkit to see if the bug is resolved?

Tina610 commented 2 years ago

Hi,I used CNV versions 0.99 and 0.98 respectively,for calling WES CNV,I had a similar problem with this log : b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio", :\n markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n length of weights should be the same as the number of probes\nExecution halted\n' I did use PON as normal reference,my command is : " cnvkit.py segment name.cnr -o name.cns --rscript-path Rscript " my cnr file has some NA-valued in weights....... So I run the CBS_RSCRIPT and found the Rscripts "tbl = tbl[tbl$weight > 0,]" should deal with NA first then filter tbl$weight > 0。

AndrewJWicks commented 2 months ago

Hello, I am currently experiencing the same problem running with no control, on versions 0.9.10 and 0.9.11.

etal commented 1 month ago

@Tina610 @AndrewJWicks Could you try the development version from GitHub and see if that works for you? Alternatively, you could remove the rows in your .cnr file that have empty or null values, which seemed to be the immediate source of the errors reported above.

etal commented 1 month ago

Or -- if the weights are all empty/null, try replacing the values with 1.0.

etal commented 1 month ago

I merged a possible fix in #914. Could you try pulling the latest development version and see if the problem is fixed now?