etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
549 stars 166 forks source link

Use of --vanilla to call R leads to issues in custom R installations #491

Open pablo-gar opened 4 years ago

pablo-gar commented 4 years ago

For reasons I'm not going to go into I'm not using R that comes with the conda installation of cnvkit.

For my own R installation I have my R packages in a location that's indicated in ~/.Rprofile, however this file is not loaded when cnvkit uses R because it's called with the --vanilla tag:

https://github.com/etal/cnvkit/blob/9495602bd3d2c0fdf8247f8516b39622e5a57713/cnvlib/segmentation/__init__.py#L161

As such the R package DNAcopy is not able to load and the subprocess call throws and error.

I'm not sure if there is a strong reason to use the --vanilla tag, a preferred call would be with the tags --no-restore and --no-environ

tetedange13 commented 4 years ago

Hi,

I'm not 100% sure it is related, but running cnvkit.py batch -m amplicon on a single BAM, with flat reference, I noticed that no .cns was written, whereas this particular line mentions it https://github.com/etal/cnvkit/blob/1c8d69d777e590f109ef31db3a2ff34e68a5fdfb/cnvlib/batch.py#L191 (I even checked adding a print("hello") within the source code and this particular line is actually never reached)
Trying to run cnvkit.py segment my_patient.cnr -o my_patient.cns separately from the batch pipeline to get this expected .cns file (I needed it to run a particular cnvkit.py call after), I got this strange Python error whose complete text was:

Traceback (most recent call last):
  File "/home/bioinfo/miniconda3/envs/cnvkit/bin/cnvkit.py", line 13, in <module>
    args.func(args)
  File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/commands.py", line 632, in _cmd_segment
    processes=args.processes)
  File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 61, in do_segmentation
    for _, ca in cnarr.by_arm())))
  File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/parallel.py", line 26, in map
    return map(func, iterable)
  File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 86, in _ds
    return _do_segmentation(*args)
  File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 157, in _do_segmentation
    seg_out = core.call_quiet(rscript_path, '--vanilla', script_fname)
  File "/home/bioinfo/miniconda3/envs/cnvkit/lib/python2.7/site-packages/cnvlib/core.py", line 36, in call_quiet
    % (' '.join(args), err))
RuntimeError: Subprocess command failed:
$ Rscript --vanilla /tmp/tmpCDv57Y

/home/bioinfo/miniconda3/envs/cnvkit/lib/R/bin/exec/R: error while loading shared libraries: libiconv.so.2: cannot open shared object file: No such file or directory

As you can notice, I'm running CNVkit (v. 0.9.6) on a CentOS6 VM and within a Miniconda3 env (v. 4.8.1). This particular error with Rscript --vanilla made me think it could be related. Then I solved it by installing this "libiconv" thing (conda install -n cnvkit -c conda-forge libiconv)

After this fix, I had another surprise, because the .cns file was now properly written using the batch pipeline I mentionned ! So my guess is that the running of this core.call_quiet() through cnvkit.py batch is actually keeping quiet an important R error. I'm not even sure if the segmentation is actually happening via the batch pipeline, as the cnvkit.py segment cmd itself raises an error...


Anyway, thanks you for this great tool and for your help.
Best regards.
Felix.

P-S: I don't know if it can be related, but I had to uninstall matplotlib within the env, upgrade pip and reinstall matplotlib, because cnvkit.py -h was raising a PyQt5 error after I installed CNVkit.

pablo-gar commented 4 years ago

That issue seems to me more of a missing/not-found dependency for R to start. The likely reason you don't get an R error is because R never starts, thus you just get the python error.

It seems that miniconda and CentOS can lead to that specific problem for R https://justinbagley.rbind.io/2018/03/26/installing-r-for-user-on-a-centos-linux-supercomputer-account/

I'm glad that you got it to work

etal commented 4 years ago

To specify a non-default R installation, try cnvkit.py batch --rscript-path ~/my/r/folder/ where the argument is the R executable indicated by your ~/.Rprofile, wherever that may be.

I'll see about --no-restore and --no-environ -- the idea with --vanilla is to make the user's R execution environment as minimal and predictable as possible, since this subprocess only needs to do one thing.

pablo-gar commented 4 years ago

The problem is not specifying the R installation that I want to use. The issue, is that after doing so it fails to load the R packages because I have them installed in a non-default path (which is indicated in the .Rprofile that never loads due to --vanilla)

etal commented 4 years ago

That makes sense, thanks. I'll switch --vanilla to --no-restore --no-environ and roll a new release.

SouzaBB commented 4 years ago

Just re-opening the post, I'm having the same issue... I'm using cnvkit 0.9.7 on a CentOS 8 system. Installation was performed using pip! The error is shown when running batch. It's writing the cnr file for the first sample, them it crashes with:

RuntimeError: Subprocess command failed:
$ Rscript --vanilla /tmp/tmp69gqw66v

All tests were successfully performed.

Tried changing --vanilla to --no-restore --no-environ, did not worked!

Edit: Figured out that the problem was with R installing using dnf. I had to remove the current R installation and install again using yum! Also downgraded to cnvkit 0.9.6.

amelinba commented 4 years ago

Also encountering the same issue ( cnvkit 0.9.7 installed with Conda on CentOS 7.8 ) :

---------------------------------

RuntimeError: Subprocess command failed: $ Rscript --vanilla /scratch/amelinba/slurm-job.15746118/tmp5t3mgy05

b"Segmenting 1\nSegmenting 10\nSegmenting 11\nSegmenting 12\nSegmenting 13\nSegmenting 14\nSegmenting 15\nSegmenting 16\nSegmenting 17\nSegmenting 18\nSegmenting 19\nSegmenting 2\nSegmenting 20\nSegmenting 21\nSegmenting 22\nSegmenting 3\nSegmenting 4\nSegmenting 5\nSegmenting 6\nSegmenting 7\nSegmenting 8\nSegmenting 9\nSegmenting GL000191.1\nSegmenting GL000192.1\nSegmenting GL000193.1\nSegmenting GL000194.1\nSegmenting GL000195.1\nSegmenting GL000196.1\nSegmenting GL000197.1\nSegmenting GL000198.1\nSegmenting GL000199.1\nSegmenting GL000200.1\nSegmenting GL000201.1\nSegmenting GL000202.1\nSegmenting GL000203.1\nSegmenting GL000204.1\nSegmenting GL000205.1\nSegmenting GL000206.1\nSegmenting GL000207.1\nSegmenting GL000208.1\nSegmenting GL000209.1\nSegmenting GL000210.1\nSegmenting GL000211.1\nSegmenting GL000212.1\nSegmenting GL000213.1\nSegmenting GL000214.1\nSegmenting GL000215.1\nSegmenting GL000216.1\nSegmenting GL000217.1\nSegmenting GL000218.1\nSegmenting GL000219.1\nSegmenting GL000220.1\nSegmenting GL000221.1\nSegmenting GL000222.1\nSegmenting GL000223.1\nSegmenting GL000224.1\nSegmenting GL000225.1\nSegmenting GL000227.1\nSegmenting GL000228.1\nSegmenting GL000229.1\nSegmenting GL000230.1\nSegmenting GL000231.1\nSegmenting GL000232.1\nSegmenting GL000233.1\nSegmenting GL000234.1\nSegmenting GL000235.1\nSegmenting GL000236.1\nSegmenting GL000237.1\nSegmenting GL000238.1\nSegmenting GL000239.1\nSegmenting GL000240.1\nSegmenting GL000241.1\nSegmenting GL000242.1\nSegmenting GL000246.1\nSegmenting GL000247.1\nSegmenting GL000248.1\nSegmenting GL000249.1\nSegmenting hs37d5\nSegmenting X\nSegmenting Y\n\n caught segfault \naddress (nil), cause 'memory not mapped'\n\nTraceback:\n 1: L2L1VitPath(o, lambda2 = lam2seq, lambda1 = 0, segmentedFit = T)\n 2: MatchPrimBd(my.y, lam1.list, lam2.list, s1, s2)\n 3: CGH.fused.lasso.one.fast(cgh.y[chrom == ch])\n 4: FUN(newX[, i], ...)\n 5: apply(Array, 2, CGH.FusedLasso.One, chromosome = chromosome, FL.norm = FL.norm, FDR = FDR)\n 6: cghFLasso(tbl$log2, FDR = 1e-04, chromosome = tbl$chromosome)\nAn irrecoverable exception occurred. R is aborting now ...\n"

---------------------------

I have tried all previous suggestions but without solving this issue (even hard-coding the Rscript path to ensure that the appropriate R version is called).

SouzaBB commented 4 years ago

Also encountering the same issue ( cnvkit 0.9.7 installed with Conda on CentOS 7.8 ) :

---------------------------------

RuntimeError: Subprocess command failed: $ Rscript --vanilla /scratch/amelinba/slurm-job.15746118/tmp5t3mgy05

b"Segmenting 1\nSegmenting 10\nSegmenting 11\nSegmenting 12\nSegmenting 13\nSegmenting 14\nSegmenting 15\nSegmenting 16\nSegmenting 17\nSegmenting 18\nSegmenting 19\nSegmenting 2\nSegmenting 20\nSegmenting 21\nSegmenting 22\nSegmenting 3\nSegmenting 4\nSegmenting 5\nSegmenting 6\nSegmenting 7\nSegmenting 8\nSegmenting 9\nSegmenting GL000191.1\nSegmenting GL000192.1\nSegmenting GL000193.1\nSegmenting GL000194.1\nSegmenting GL000195.1\nSegmenting GL000196.1\nSegmenting GL000197.1\nSegmenting GL000198.1\nSegmenting GL000199.1\nSegmenting GL000200.1\nSegmenting GL000201.1\nSegmenting GL000202.1\nSegmenting GL000203.1\nSegmenting GL000204.1\nSegmenting GL000205.1\nSegmenting GL000206.1\nSegmenting GL000207.1\nSegmenting GL000208.1\nSegmenting GL000209.1\nSegmenting GL000210.1\nSegmenting GL000211.1\nSegmenting GL000212.1\nSegmenting GL000213.1\nSegmenting GL000214.1\nSegmenting GL000215.1\nSegmenting GL000216.1\nSegmenting GL000217.1\nSegmenting GL000218.1\nSegmenting GL000219.1\nSegmenting GL000220.1\nSegmenting GL000221.1\nSegmenting GL000222.1\nSegmenting GL000223.1\nSegmenting GL000224.1\nSegmenting GL000225.1\nSegmenting GL000227.1\nSegmenting GL000228.1\nSegmenting GL000229.1\nSegmenting GL000230.1\nSegmenting GL000231.1\nSegmenting GL000232.1\nSegmenting GL000233.1\nSegmenting GL000234.1\nSegmenting GL000235.1\nSegmenting GL000236.1\nSegmenting GL000237.1\nSegmenting GL000238.1\nSegmenting GL000239.1\nSegmenting GL000240.1\nSegmenting GL000241.1\nSegmenting GL000242.1\nSegmenting GL000246.1\nSegmenting GL000247.1\nSegmenting GL000248.1\nSegmenting GL000249.1\nSegmenting hs37d5\nSegmenting X\nSegmenting Y\n\n caught segfault \naddress (nil), cause 'memory not mapped'\n\nTraceback:\n 1: L2L1VitPath(o, lambda2 = lam2seq, lambda1 = 0, segmentedFit = T)\n 2: MatchPrimBd(my.y, lam1.list, lam2.list, s1, s2)\n 3: CGH.fused.lasso.one.fast(cgh.y[chrom == ch])\n 4: FUN(newX[, i], ...)\n 5: apply(Array, 2, CGH.FusedLasso.One, chromosome = chromosome, FL.norm = FL.norm, FDR = FDR)\n 6: cghFLasso(tbl$log2, FDR = 1e-04, chromosome = tbl$chromosome)\nAn irrecoverable exception occurred. R is aborting now ...\n"

---------------------------

I have tried all previous suggestions but without solving this issue (even hard-coding the Rscript path to ensure that the appropriate R version is called).

It seems like you're having a memory overload for the session or some dependencies issue! Try to not save the workspace, remove all packages, restart R and reinstall cnvkit.

amelinba commented 4 years ago

You are right, increasing the amount of RAM (from 8GO to 16GO) has solved this issue. Thank you for your suggestion

PS: This error appeared only when I use the "flasso" segmentation.

etal commented 4 years ago

I've switched --vanilla to --no-restore --no-environ so that .Rprofile library paths will be respected. Does it work for you now?

sainsachiko commented 6 months ago

Hi @etal, I am using cnkit version 0.9.10 and encounter some probems. Is there any suggestion to resolve this? Thank you in advance! When I used the batch pipeline, it just suddenly exit, no .cns was written.

Time: 1.561 seconds (274679 reads/sec, 68 bins/sec)
Summary: #bins=106, #reads=428833, mean=4045.5954, min=0.0, max=17961.666666666668
Percent reads in regions: 77.704 (of 551883 mapped)
Wrote ./HelagDNA_86.targetcoverage.cnn with 106 regions
Processing reads in HelagDNA_86.recal.bam
Time: 0.194 seconds (0 reads/sec, 28529 bins/sec)
Summary: #bins=5545, #reads=0, mean=0.0000, min=0.0, max=0.0
Percent reads in regions: 0.000 (of 551883 mapped)
Wrote ./HelagDNA_86.antitargetcoverage.cnn with 5545 regions
Processing target: HelagDNA_86
Keeping 81 of 106 bins
Correcting for GC bias...
Correcting for density bias...
Processing antitarget: HelagDNA_86
Keeping 5305 of 5545 bins
Correcting for GC bias...
Correcting for RepeatMasker bias...
WARNING: Most antitarget bins (100.00%, 5305/5305) have low or no coverage; is this amplicon/WGS?
Antitargets are nan x more variable than targets
Wrote ./HelagDNA_86.cnr with 5386 regions
Segmenting ./HelagDNA_86.cnr ...
Segmenting with method 'cbs', significance threshold 0.0001, in 2 processes
Smoothing overshot at 25 / 353 indices: (-0.07310925348512008, 1.1554857326405217) vs. original (0.0, 2.3764156882527363)
Smoothing overshot at 43 / 517 indices: (-0.12022921033509101, 0.010321914976002435) vs. original (-0.277336911924003, 0.0)
Smoothing overshot at 18 / 888 indices: (-0.6331211103153426, 0.03683135852259898) vs. original (-1.9297402155945544, 0.0)
Smoothing overshot at 23 / 197 indices: (-0.07667626652646223, 0.8931217702676555) vs. original (0.0, 2.0601951310149804)

When I tried segementation only with from the resulted cnr file, I got this error. FYI: my cnr have NA values.

Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes
Smoothing overshot at 25 / 353 indices: (-0.07310932619195304, 1.155487080636834) vs. original (0.0, 2.37642)
Traceback (most recent call last):
  File "/opt/miniconda/envs/cnvkit9/bin/cnvkit.py", line 8, in <module>
    sys.exit(main())
  File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/cnvkit.py", line 10, in main
    args.func(args)
  File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/commands.py", line 994, in _cmd_segment
    results = segmentation.do_segmentation(
  File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/segmentation/__init__.py", line 79, in do_segmentation
    rets = list(
  File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/segmentation/__init__.py", line 123, in _ds
    return _do_segmentation(*args)
  File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/segmentation/__init__.py", line 205, in _do_segmentation
    seg_out = core.call_quiet(
  File "/opt/miniconda/envs/cnvkit9/lib/python3.9/site-packages/cnvlib/core.py", line 32, in call_quiet
    raise RuntimeError(
RuntimeError: Subprocess command failed:
$ Rscript --no-restore --no-environ /tmp/tmphphd9l0y

b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio",  :\n  markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n  length of weights should be the same as the number of probes\nExecution halted\n'