Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

R package problem, plotting error #65

Closed Hofphi closed 5 years ago

Hofphi commented 5 years ago

Hey,

I keep having trouble with the same error message. There seems to be a problem during the plotting with R and the "reshape2" package. I already installed all packages and the dependencies and checked if I have the current version but I still keep having the same error! Any suggestions?

Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/439f0232 Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/439f0232 Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/118bc3f0 Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/118bc3f0 Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/118bc3f0 Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/118bc3f0 [Tue Nov 27 01:26:19 2018] Finished job 10. 1 of 7 steps (14%) done [Tue Nov 27 01:26:20 2018] Finished job 16. 2 of 7 steps (29%) done Error: package or namespace load failed for ‘reshape2’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/home/hofphi00/R/x86_64-pc-linux-gnu-library/3.4/stringi/libs/stringi.so': libicui18n.so.57: cannot open shared object file: No such file or directory Error: package or namespace load failed for ‘reshape2’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/home/hofphi00/R/x86_64-pc-linux-gnu-library/3.4/stringi/libs/stringi.so': libicui18n.so.57: cannot open shared object file: No such file or directory Error: package or namespace load failed for ‘reshape2’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/home/hofphi00/R/x86_64-pc-linux-gnu-library/3.4/stringi/libs/stringi.so': libicui18n.so.57: cannot open shared object file: No such file or directory Execution halted Execution halted Execution halted [Tue Nov 27 01:26:30 2018] [Tue Nov 27 01:26:30 2018] [Tue Nov 27 01:26:30 2018] Error in rule plot_yield: Error in rule plot_BC_drop: Error in rule plot_rna_metrics: jobid: 20 jobid: 17 jobid: 14 output: plots/yield.pdf output: plots/BC_drop.pdf

Hoohm commented 5 years ago

Hello @Hofphi

the main reason why installing any package should not resolve the issue is that conda is activating a specific env for each rule.

You can add icu to the plots.yaml

channels:
  - conda-forge
dependencies:
  - r=3.4.1
  - r-ggplot2=2.2.1
  - r-gridextra
  - r-reshape2
  - r-viridis
  - r-stringdist
  - r-dplyr
  - icu=58.2

Let me know if that fixed it

seb-mueller commented 5 years ago

Also it looks like your local R install interfere with conda R. Specifically, it's trying load from your local installation /home/hofphi00/R/*. Do you have any entries in .Rprofile or .Renviron setting this? You might want to move them elsewhere temporarily to check if that gives the same error. Also see this issue: https://github.com/conda-forge/r-base-feedstock/issues/37

Hofphi commented 5 years ago

Hey, thanks for your replies. @Hoohm I modified the plots.yaml and added the - icu=58.2 but I still get the same error. I noticed that the error message says libicui18n.so.57: cannot open shared object file: No such file or directory:

I checked in this directory /home/hofphi00/miniconda3/lib and can only find libicui18n.so.58 not libicui18n.so.57

@seb-mueller You are right, conda is definitely using the R system package library and not the conda installed R library but I also can not find de .Rprofile or .Renviron that you are talking about.

I think both issues might be the source of the problem but I dont see where exactly I can find the .Rprofile or .Renviron files to make the changes in the path and I am also not sure how to downgrade the system-file from libicui18n.so.58 to libicui18n.so.57

Hoohm commented 5 years ago

Try to add the icu 57 then.

channels:
  - conda-forge
  - floriangeigl
dependencies:
  - r=3.4.1
  - r-ggplot2=2.2.1
  - r-gridextra
  - r-reshape2
  - r-viridis
  - r-stringdist
  - r-dplyr
  - icu=57.1
seb-mueller commented 5 years ago

The root problem still is incompatible versions between the local and conda packages. conda-R somehow has to be told to not load local packages.

In R, if you read help(Startup), you might find indications on what is still making R including this in .libPaths() other than ~/.Renviron or ~/.Rprofile, for example R_LIBS_USER or R_LIBS. Could you check if the are set e.g. echo $R_LIBS in the terminal? Also which system are you using, Linux? For a deactivation of your local install, you could quickly rename your local path, e.g. mv ~/R ~/Rmoved and see if it works (don't forget to redo the renaming afterwards.

seb-mueller commented 5 years ago

@Hoohm . I found it a bit odd this can happen anyway. Ideally, R should be not dependent on any systems settings/environments. I couldn't find anything in the Anaconda documentation, but I suppose starting R with in vanilla mode by default should do the trick R --vanilla, impying implies --no-site-file, --no-init-file, --no-restore and --no-environ. Do you know who to do this, if only for testing? Edit: This thread seems to come accross the same issue: https://stackoverflow.com/questions/52364380/snakemake-ignore-rprofile-when-executing-an-r-script Edit2: Seems this has actually been fixed and snakemake is doing this by default since Sep18: https://bitbucket.org/snakemake/snakemake/commits/44bc388

@Hofphi Could you update Snakemake and check again?

Hoohm commented 5 years ago

Was gonna add that vanilla is the default behaviour

Hofphi commented 5 years ago

I updated Snakemake as suggested by seb-mueller!

Now I get the following error:

Fontconfig warning: "/cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/etc/fonts/fonts.conf", line 86: unknown element "blank"
/home/hofphi00/miniconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
AttributeError in line 1 of /scratch/hofphi00/dropSeqPipe/Snakefile:
module 'matplotlib' has no attribute 'artist'
  File "/scratch/hofphi00/dropSeqPipe/Snakefile", line 1, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/__init__.py", line 42, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/core/api.py", line 10, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/core/groupby/__init__.py", line 2, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 49, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 74, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/core/series.py", line 81, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/plotting/__init__.py", line 15, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/pandas/plotting/_converter.py", line 8, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 1111, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 891, in __getitem__
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/matplotlib/pyplot.py", line 32, in <module>
  File "/home/hofphi00/miniconda3/lib/python3.6/site-packages/matplotlib/colorbar.py", line 28, in <module>

I run the pipeline on a Linux cluster.

Hoohm commented 5 years ago

From the few github issues (1,2,3) I have found this is something quite recent.

Could you try something and report back the potential errors? Run a python shell and try to import both numpy and matplotlib, see what happens there. It should get you the same error.

Depending on how much control you have on the cluster you can try to remove numpy and matplotlib and reinstall them.

Most of the time you don't have root access, so the other possibility is to install them via conda locally. Just be sure that your conda path is in your .bashrc making the conda python the primary python you use. conda install -c conda-forge matplotlib=3.0.2 conda install -c conda-forge numpy=1.15.4

Test the imports again afterwards.

Hoohm commented 5 years ago

I that doesn't work, I guess the best way to deal with it is to come back to older versions of matplotlib. Although I'm not sure how the dependencies of snakemake will deal with the backrolling of versions.

Hofphi commented 5 years ago

I updated the conda environment and also tried to install matplotlib and numpy as suggested by Hoohm. Now I get a CondaHTTPError that did not occur before.

Building DAG of jobs...
Creating conda environment envs/plots.yaml...
Downloading remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /scratch/hofphi00/dropSeqPipe/rules/../envs/plots.yaml:
Solving environment: ...working... failed

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/pro/noarch/repodata.json.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

If your current network has https://www.anaconda.com blocked, please file
a support request with your network engineering team.

SSLError(MaxRetryError('HTTPSConnectionPool(host=\'repo.anaconda.com\', port=443): Max retries exceeded with url: /pkgs/pro/noarch/repodata.json.bz2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, \'Unexpected EOF\')",),))',),)

Since I never got this message before on the cluster I can not believe that is has something to do with a blocked network for https://www.anaconda.com. I am currently looking for fixes for this issue!

Hofphi commented 5 years ago
~$ conda info

     active environment : None
       user config file : /home/hofphi00/.condarc
 populated config files : 
          conda version : 4.5.11
    conda-build version : not installed
         python version : 3.6.6.final.0
       base environment : /home/hofphi00/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/linux-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/linux-64
                          https://repo.anaconda.com/pkgs/pro/noarch
          package cache : /home/hofphi00/miniconda3/pkgs
                          /home/hofphi00/.conda/pkgs
       envs directories : /home/hofphi00/miniconda3/envs
                          /home/hofphi00/.conda/envs
               platform : linux-64
             user-agent : conda/4.5.11 requests/2.18.4 CPython/3.6.6 Linux/3.10.0-862.14.4.el7.x86_64 centos/7 glibc/2.17
                UID:GID : 3066188:3066188
             netrc file : None
           offline mode : False
Hofphi commented 5 years ago

I set conda config --set ssl_verify no and don't get the network issue. The conda environment is loaded but now the "original" reshape2 package issue is back.

Hoohm commented 5 years ago

Ok. Would you be ok with deleting all the miniconda folder and reinstalling from scratch?

seb-mueller commented 5 years ago

I suppose the vanilla default is not included yet in the most recent snakemake release, which version are you on right now? If that's the case, have you tried moving your R library temporarily as suggested above?

Hofphi commented 5 years ago

Reinstalled miniconda and snakemake from scratch and still get the reshape2 error. I currently have: ~$ snakemake --version 5.3.0

~$ conda info

     active environment : None
     user config file : /home/hofphi00/.condarc
 populated config files : 
          conda version : 4.5.11
    conda-build version : not installed
         python version : 3.6.6.final.0
       base environment : /home/hofphi00/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/linux-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/linux-64
                          https://repo.anaconda.com/pkgs/pro/noarch
          package cache : /home/hofphi00/miniconda3/pkgs
                          /home/hofphi00/.conda/pkgs
       envs directories : /home/hofphi00/miniconda3/envs
                          /home/hofphi00/.conda/envs
               platform : linux-64
             user-agent : conda/4.5.11 requests/2.13.0 CPython/3.6.6 Linux/3.10.0-693.5.2.el7.x86_64 centos/7 glibc/2.17
                UID:GID : 3066188:3066188
             netrc file : None
           offline mode : False

I have not installed R through conda yet so I can not see the software in ~/miniconda3/lib. Before reinstalling miniconda I did have R and even tried to change the Lib-Path in .Renviron without any success. Miniconda is still trying to load the R packages through my "system" R.

seb-mueller commented 5 years ago

Just checked and snakemake 5.3 was release a day before the vanilla commit, so this will be fixed in the next release. R will not be installed manually, but by conda on the fly upon running snakemake. But did you actually try as suggested above? mv ~/R ~/Rmoved? This moves R out of the expected path deactivating it temporarily.

Hofphi commented 5 years ago

@seb-mueller I changed the name of my R library directory mv ~/R ~/Rmoved. It seemed to work with this tweak:

Building DAG of jobs...
Using shell: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/bash
Provided cores: 12
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1       plot_BC_drop
        1       plot_rna_metrics
        1       plot_yield
        4

[Fri Nov 30 01:07:22 2018]
localrule plot_BC_drop:
    input: logs/rat_data_N705_CELL_barcode.txt, logs/rat_data_N705_UMI_barcode.txt, logs/rat_data_N705_reads_left.txt, logs/rat_data_N705_reads_left_trim.txt
    output: plots/BC_drop.pdf
    jobid: 23

[Fri Nov 30 01:07:22 2018]
localrule plot_yield:
    input: logs/rat_data_N705_CELL_barcode.txt, logs/rat_data_N705_UMI_barcode.txt, logs/rat_data_N705_reads_left.txt, data/rat_data_N705/Log.final.out, logs/rat_data_N705_reads_left_trim.txt
    output: plots/yield.pdf
    jobid: 5

[Fri Nov 30 01:07:22 2018]
localrule plot_rna_metrics:
    input: logs/rat_data_N705_rna_metrics.txt
    output: plots/rat_data_N705_rna_metrics.pdf
    jobid: 20
    wildcards: sample=rat_data_N705

Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/688e4a11
Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/688e4a11
Activating conda environment: /scratch/hofphi00/dropSeqPipe/.snakemake/conda/688e4a11
Loading required package: viridisLite
Loading required package: viridisLite
Warning: Ignoring unknown parameters: binwidth, bins, pad
Warning: Ignoring unknown parameters: binwidth, bins, pad
pdf
  2
pdf
  2
[Fri Nov 30 01:07:54 2018]
Finished job 5.
1 of 4 steps (25%) done
[Fri Nov 30 01:07:54 2018]
Finished job 23.
2 of 4 steps (50%) done
pdf
  2
[Fri Nov 30 01:07:55 2018]
Finished job 20.
3 of 4 steps (75%) done

[Fri Nov 30 01:07:55 2018]
localrule all:
    input: /home/hofphi00/scratch/dropSeqPipe/ref_genome/ratRnor6_0.refFlat, /home/hofphi00/scratch/dropSeqPipe/ref_genome/ratRnor6_0.reduced.gtf, /home/hofphi00/scratch/dropSeqPipe/ref_genome/ratRnor6_0.dict, /home/hofphi00/scratch/dropSeqPipe/ref_genome/ratRnor6_0.rRNA.intervals, /home/hofphi00/scratch/dropSeqPipe/ref_genome/STAR_INDEX/SA_75/SA, logs/rat_data_N705_R1_fastqc.html, logs/rat_data_N705_R2_fastqc.html, reports/fastqc_reads.html, reports/fastqc_barcodes.html, reports/fastqc_reads_data/multiqc_general_stats.txt, data/rat_data_N705_filtered.fastq.gz, plots/rat_data_N705_polya_trimmed.pdf, plots/rat_data_N705_start_trim.pdf, plots/rat_data_N705_CELL_dropped.pdf, plots/rat_data_N705_UMI_dropped.pdf, plots/BC_drop.pdf, reports/filter.html, data/rat_data_N705_final.bam, logs/rat_data_N705_hist_out_cell.txt, plots/rat_data_N705_knee_plot.pdf, reports/star.html, plots/yield.pdf, logs/rat_data_N705_umi_per_gene.tsv, plots/rat_data_N705_rna_metrics.pdf, summary/umi_expression_matrix.tsv, summary/counts_expression_matrix.tsv
    jobid: 0

[Fri Nov 30 01:07:55 2018]
Finished job 0.
4 of 4 steps (100%) done
Complete log: /scratch/hofphi00/dropSeqPipe/.snakemake/log/2018-11-30T010659.350517.snakemake.log

However, I don't think that would be a nice fix in the long run. I am going to try to change the file ~/miniconda3/lib/python3.6/site-packages/snakemake/script.py

From shell("Rscript {f.name}", bench_record=bench_record) to shell("Rscript --vanilla {f.name}", bench_record=bench_record)

I hope this will work for me as a long term fix! I will let you know.

Appreciate your help so far ;)

seb-mueller commented 5 years ago

Glad it helped. And yes, that would only be a temp workaround, until the next version of snakmake should sort it permanently.