Open pablo-gar opened 4 years ago
Hi,
I'm not 100% sure it is related, but running cnvkit.py batch -m amplicon
on a single BAM, with flat reference, I noticed that no .cns was written, whereas this particular line mentions it
https://github.com/etal/cnvkit/blob/1c8d69d777e590f109ef31db3a2ff34e68a5fdfb/cnvlib/batch.py#L191
(I even checked adding a print("hello")
within the source code and this particular line is actually never reached)
Trying to run cnvkit.py segment my_patient.cnr -o my_patient.cns
separately from the batch pipeline to get this expected .cns file (I needed it to run a particular cnvkit.py call
after), I got this strange Python error whose complete text was:
Traceback (most recent call last):
File "/home/bioinfo/miniconda3/envs/cnvkit/bin/cnvkit.py", line 13, in <module>
args.func(args)
File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/commands.py", line 632, in _cmd_segment
processes=args.processes)
File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 61, in do_segmentation
for _, ca in cnarr.by_arm())))
File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/parallel.py", line 26, in map
return map(func, iterable)
File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 86, in _ds
return _do_segmentation(*args)
File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 157, in _do_segmentation
seg_out = core.call_quiet(rscript_path, '--vanilla', script_fname)
File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/core.py", line 36, in call_quiet
% (' '.join(args), err))
RuntimeError: Subprocess command failed:
$ Rscript --vanilla /tmp/tmpCDv57Y
/home/bioinfo/miniconda3/envs/cnvkit/lib/R/bin/exec/R: error while loading shared libraries: libiconv.so.2: cannot open shared object file: No such file or directory
As you can notice, I'm running CNVkit (v. 0.9.6) on a CentOS6 VM and within a Miniconda3 env (v. 4.8.1). This particular error with Rscript --vanilla
made me think it could be related.
Then I solved it by installing this "libiconv" thing (conda install -n cnvkit -c conda-forge libiconv
)
After this fix, I had another surprise, because the .cns file was now properly written using the batch pipeline I mentionned !
So my guess is that the running of this core.call_quiet()
through cnvkit.py batch
is actually keeping quiet an important R error. I'm not even sure if the segmentation is actually happening via the batch pipeline, as the cnvkit.py segment
cmd itself raises an error...
Anyway, thanks you for this great tool and for your help.
Best regards.
Felix.
P-S: I don't know if it can be related, but I had to uninstall matplotlib within the env, upgrade pip and reinstall matplotlib, because cnvkit.py -h
was raising a PyQt5 error after I installed CNVkit.
That issue seems to me more of a missing/not-found dependency for R to start. The likely reason you don't get an R error is because R never starts, thus you just get the python error.
It seems that miniconda and CentOS can lead to that specific problem for R https://justinbagley.rbind.io/2018/03/26/installing-r-for-user-on-a-centos-linux-supercomputer-account/
I'm glad that you got it to work
To specify a non-default R installation, try cnvkit.py batch --rscript-path ~/my/r/folder/
where the argument is the R executable indicated by your ~/.Rprofile
, wherever that may be.
I'll see about --no-restore
and --no-environ
-- the idea with --vanilla
is to make the user's R execution environment as minimal and predictable as possible, since this subprocess only needs to do one thing.
The problem is not specifying the R installation that I want to use. The issue, is that after doing so it fails to load the R packages because I have them installed in a non-default path (which is indicated in the .Rprofile that never loads due to --vanilla)
That makes sense, thanks. I'll switch --vanilla
to --no-restore --no-environ
and roll a new release.
Just re-opening the post, I'm having the same issue... I'm using cnvkit 0.9.7 on a CentOS 8 system. Installation was performed using pip! The error is shown when running batch
. It's writing the cnr file for the first sample, them it crashes with:
RuntimeError: Subprocess command failed:
$ Rscript --vanilla /tmp/tmp69gqw66v
All tests were successfully performed.
Tried changing --vanilla
to --no-restore --no-environ
, did not worked!
Edit: Figured out that the problem was with R installing using dnf. I had to remove the current R installation and install again using yum! Also downgraded to cnvkit 0.9.6.
Also encountering the same issue ( cnvkit 0.9.7 installed with Conda on CentOS 7.8 ) :
RuntimeError: Subprocess command failed: $ Rscript --vanilla /scratch/amelinba/slurm-job.15746118/tmp5t3mgy05
b"Segmenting 1\nSegmenting 10\nSegmenting 11\nSegmenting 12\nSegmenting 13\nSegmenting 14\nSegmenting 15\nSegmenting 16\nSegmenting 17\nSegmenting 18\nSegmenting 19\nSegmenting 2\nSegmenting 20\nSegmenting 21\nSegmenting 22\nSegmenting 3\nSegmenting 4\nSegmenting 5\nSegmenting 6\nSegmenting 7\nSegmenting 8\nSegmenting 9\nSegmenting GL000191.1\nSegmenting GL000192.1\nSegmenting GL000193.1\nSegmenting GL000194.1\nSegmenting GL000195.1\nSegmenting GL000196.1\nSegmenting GL000197.1\nSegmenting GL000198.1\nSegmenting GL000199.1\nSegmenting GL000200.1\nSegmenting GL000201.1\nSegmenting GL000202.1\nSegmenting GL000203.1\nSegmenting GL000204.1\nSegmenting GL000205.1\nSegmenting GL000206.1\nSegmenting GL000207.1\nSegmenting GL000208.1\nSegmenting GL000209.1\nSegmenting GL000210.1\nSegmenting GL000211.1\nSegmenting GL000212.1\nSegmenting GL000213.1\nSegmenting GL000214.1\nSegmenting GL000215.1\nSegmenting GL000216.1\nSegmenting GL000217.1\nSegmenting GL000218.1\nSegmenting GL000219.1\nSegmenting GL000220.1\nSegmenting GL000221.1\nSegmenting GL000222.1\nSegmenting GL000223.1\nSegmenting GL000224.1\nSegmenting GL000225.1\nSegmenting GL000227.1\nSegmenting GL000228.1\nSegmenting GL000229.1\nSegmenting GL000230.1\nSegmenting GL000231.1\nSegmenting GL000232.1\nSegmenting GL000233.1\nSegmenting GL000234.1\nSegmenting GL000235.1\nSegmenting GL000236.1\nSegmenting GL000237.1\nSegmenting GL000238.1\nSegmenting GL000239.1\nSegmenting GL000240.1\nSegmenting GL000241.1\nSegmenting GL000242.1\nSegmenting GL000246.1\nSegmenting GL000247.1\nSegmenting GL000248.1\nSegmenting GL000249.1\nSegmenting hs37d5\nSegmenting X\nSegmenting Y\n\n caught segfault \naddress (nil), cause 'memory not mapped'\n\nTraceback:\n 1: L2L1VitPath(o, lambda2 = lam2seq, lambda1 = 0, segmentedFit = T)\n 2: MatchPrimBd(my.y, lam1.list, lam2.list, s1, s2)\n 3: CGH.fused.lasso.one.fast(cgh.y[chrom == ch])\n 4: FUN(newX[, i], ...)\n 5: apply(Array, 2, CGH.FusedLasso.One, chromosome = chromosome, FL.norm = FL.norm, FDR = FDR)\n 6: cghFLasso(tbl$log2, FDR = 1e-04, chromosome = tbl$chromosome)\nAn irrecoverable exception occurred. R is aborting now ...\n"
I have tried all previous suggestions but without solving this issue (even hard-coding the Rscript path to ensure that the appropriate R version is called).
Also encountering the same issue ( cnvkit 0.9.7 installed with Conda on CentOS 7.8 ) :
---------------------------------
RuntimeError: Subprocess command failed: $ Rscript --vanilla /scratch/amelinba/slurm-job.15746118/tmp5t3mgy05
b"Segmenting 1\nSegmenting 10\nSegmenting 11\nSegmenting 12\nSegmenting 13\nSegmenting 14\nSegmenting 15\nSegmenting 16\nSegmenting 17\nSegmenting 18\nSegmenting 19\nSegmenting 2\nSegmenting 20\nSegmenting 21\nSegmenting 22\nSegmenting 3\nSegmenting 4\nSegmenting 5\nSegmenting 6\nSegmenting 7\nSegmenting 8\nSegmenting 9\nSegmenting GL000191.1\nSegmenting GL000192.1\nSegmenting GL000193.1\nSegmenting GL000194.1\nSegmenting GL000195.1\nSegmenting GL000196.1\nSegmenting GL000197.1\nSegmenting GL000198.1\nSegmenting GL000199.1\nSegmenting GL000200.1\nSegmenting GL000201.1\nSegmenting GL000202.1\nSegmenting GL000203.1\nSegmenting GL000204.1\nSegmenting GL000205.1\nSegmenting GL000206.1\nSegmenting GL000207.1\nSegmenting GL000208.1\nSegmenting GL000209.1\nSegmenting GL000210.1\nSegmenting GL000211.1\nSegmenting GL000212.1\nSegmenting GL000213.1\nSegmenting GL000214.1\nSegmenting GL000215.1\nSegmenting GL000216.1\nSegmenting GL000217.1\nSegmenting GL000218.1\nSegmenting GL000219.1\nSegmenting GL000220.1\nSegmenting GL000221.1\nSegmenting GL000222.1\nSegmenting GL000223.1\nSegmenting GL000224.1\nSegmenting GL000225.1\nSegmenting GL000227.1\nSegmenting GL000228.1\nSegmenting GL000229.1\nSegmenting GL000230.1\nSegmenting GL000231.1\nSegmenting GL000232.1\nSegmenting GL000233.1\nSegmenting GL000234.1\nSegmenting GL000235.1\nSegmenting GL000236.1\nSegmenting GL000237.1\nSegmenting GL000238.1\nSegmenting GL000239.1\nSegmenting GL000240.1\nSegmenting GL000241.1\nSegmenting GL000242.1\nSegmenting GL000246.1\nSegmenting GL000247.1\nSegmenting GL000248.1\nSegmenting GL000249.1\nSegmenting hs37d5\nSegmenting X\nSegmenting Y\n\n caught segfault \naddress (nil), cause 'memory not mapped'\n\nTraceback:\n 1: L2L1VitPath(o, lambda2 = lam2seq, lambda1 = 0, segmentedFit = T)\n 2: MatchPrimBd(my.y, lam1.list, lam2.list, s1, s2)\n 3: CGH.fused.lasso.one.fast(cgh.y[chrom == ch])\n 4: FUN(newX[, i], ...)\n 5: apply(Array, 2, CGH.FusedLasso.One, chromosome = chromosome, FL.norm = FL.norm, FDR = FDR)\n 6: cghFLasso(tbl$log2, FDR = 1e-04, chromosome = tbl$chromosome)\nAn irrecoverable exception occurred. R is aborting now ...\n"
---------------------------
I have tried all previous suggestions but without solving this issue (even hard-coding the Rscript path to ensure that the appropriate R version is called).
It seems like you're having a memory overload for the session or some dependencies issue! Try to not save the workspace, remove all packages, restart R and reinstall cnvkit.
You are right, increasing the amount of RAM (from 8GO to 16GO) has solved this issue. Thank you for your suggestion
PS: This error appeared only when I use the "flasso" segmentation.
I've switched --vanilla
to --no-restore --no-environ
so that .Rprofile library paths will be respected. Does it work for you now?
Hi @etal, I am using cnkit version 0.9.10 and encounter some probems. Is there any suggestion to resolve this? Thank you in advance! When I used the batch pipeline, it just suddenly exit, no .cns was written.
Time: 1.561 seconds (274679 reads/sec, 68 bins/sec)
Summary: #bins=106, #reads=428833, mean=4045.5954, min=0.0, max=17961.666666666668
Percent reads in regions: 77.704 (of 551883 mapped)
Wrote ./HelagDNA_86.targetcoverage.cnn with 106 regions
Processing reads in HelagDNA_86.recal.bam
Time: 0.194 seconds (0 reads/sec, 28529 bins/sec)
Summary: #bins=5545, #reads=0, mean=0.0000, min=0.0, max=0.0
Percent reads in regions: 0.000 (of 551883 mapped)
Wrote ./HelagDNA_86.antitargetcoverage.cnn with 5545 regions
Processing target: HelagDNA_86
Keeping 81 of 106 bins
Correcting for GC bias...
Correcting for density bias...
Processing antitarget: HelagDNA_86
Keeping 5305 of 5545 bins
Correcting for GC bias...
Correcting for RepeatMasker bias...
WARNING: Most antitarget bins (100.00%, 5305/5305) have low or no coverage; is this amplicon/WGS?
Antitargets are nan x more variable than targets
Wrote ./HelagDNA_86.cnr with 5386 regions
Segmenting ./HelagDNA_86.cnr ...
Segmenting with method 'cbs', significance threshold 0.0001, in 2 processes
Smoothing overshot at 25 / 353 indices: (-0.07310925348512008, 1.1554857326405217) vs. original (0.0, 2.3764156882527363)
Smoothing overshot at 43 / 517 indices: (-0.12022921033509101, 0.010321914976002435) vs. original (-0.277336911924003, 0.0)
Smoothing overshot at 18 / 888 indices: (-0.6331211103153426, 0.03683135852259898) vs. original (-1.9297402155945544, 0.0)
Smoothing overshot at 23 / 197 indices: (-0.07667626652646223, 0.8931217702676555) vs. original (0.0, 2.0601951310149804)
When I tried segementation only with from the resulted cnr file, I got this error. FYI: my cnr have NA values.
Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes
Smoothing overshot at 25 / 353 indices: (-0.07310932619195304, 1.155487080636834) vs. original (0.0, 2.37642)
Traceback (most recent call last):
File "/opt/miniconda/envs/cnvkit9/bin/cnvkit.py", line 8, in <module>
sys.exit(main())
File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/cnvkit.py", line 10, in main
args.func(args)
File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/commands.py", line 994, in _cmd_segment
results = segmentation.do_segmentation(
File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/segmentation/__init__.py", line 79, in do_segmentation
rets = list(
File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/segmentation/__init__.py", line 123, in _ds
return _do_segmentation(*args)
File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/segmentation/__init__.py", line 205, in _do_segmentation
seg_out = core.call_quiet(
File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/core.py", line 32, in call_quiet
raise RuntimeError(
RuntimeError: Subprocess command failed:
$ Rscript --no-restore --no-environ /tmp/tmphphd9l0y
b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio", :\n markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n length of weights should be the same as the number of probes\nExecution halted\n'
For reasons I'm not going to go into I'm not using R that comes with the conda installation of cnvkit.
For my own R installation I have my R packages in a location that's indicated in ~/.Rprofile, however this file is not loaded when cnvkit uses R because it's called with the --vanilla tag:
https://github.com/etal/cnvkit/blob/9495602bd3d2c0fdf8247f8516b39622e5a57713/cnvlib/segmentation/__init__.py#L161
As such the R package DNAcopy is not able to load and the subprocess call throws and error.
I'm not sure if there is a strong reason to use the --vanilla tag, a preferred call would be with the tags --no-restore and --no-environ