conda-forge / r-base-feedstock

A conda-smithy repository for r-base.
BSD 3-Clause "New" or "Revised" License
14 stars 47 forks source link

Cannot install Bioconductor package ShortRead #74

Open brendanf opened 5 years ago

brendanf commented 5 years ago

Issue: Installing the Bioconductor package ShortRead from inside R (not from conda) fails with the error:

> setRepositories(g=FALSE)
--- Please select repositories for use in this session ---

1: + CRAN
2:   BioC software
3:   BioC annotation
4:   BioC experiment
5:   CRAN (extras)
6:   Omegahat
7:   R-Forge
8:   rforge.net

Enter one or more numbers separated by spaces, or an empty line to cancel
1: 1 2 3 4
> install.packages("ShortRead")
...lots of dependencies are installed, until finally...
checking for gzeof in -lz... no
configure: error: zlib not found
ERROR: configuration failed for package ‘ShortRead’

This has been previously reported here, here and here, without any truly satisfactory resolution, except for the observation that the difference was introduced with zlib 1.2.11.

I tried this:

$ conda create -n test1 zlib=1.2.11
$ conda create -n test2 zlib=1.2.8
$ diff -r $CONDA_PREFIX/test1 $CONDA_PREFIX/test2

but none of the results seemed to explain any possible difference.

I then tried

$ conda install -n test1 r-base=3.4.1
$ conda install -n test2 r-base=3.4.1
$ diff -r $CONDA_PREFIX/test1 $CONDA_PREFIX/test2

This yielded, unsurprisingly, a LOT of results. Using GUI diff tools and grep guesswork, I finally found this:

diff -r $CONDA_PREFIX/envs/test1/lib/R/bin /libtool $CONDA_PREFIX/envs/test2/lib/R/bin/libtool
3c3
< # Libtool was configured on host c84bcb0f5cfa:
---
> # Libtool was configured on host bf6a7e593731:
168c168
< LTCFLAGS="-I/home/brendan/miniconda3/envs/test1/include"
---
> LTCFLAGS="-I/home/brendan/miniconda3/envs/test2/include"
285c285
< sys_lib_search_path_spec="/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2 /opt/rh/devtoolset-2/root/usr/lib64 /lib64 /usr/lib64 /opt/rh/devtoolset-2/root/usr/lib /lib /usr/lib "
---
> sys_lib_search_path_spec="/home/brendan/miniconda3/envs/test2/lib/gcc/x86_64-unknown-linux-gnu/4.8.5 /home/brendan/miniconda3/envs/test2/lib/gcc /home/brendan/miniconda3/envs/test2/lib /lib /usr/lib "
308c308
< LD="/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld -m elf_x86_64"
---
> LD="/opt/rh/devtoolset-2/root/usr/bin/ld -m elf_x86_64"
11664c11664
< LD="/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld -m elf_x86_64"
---
> LD="/opt/rh/devtoolset-2/root/usr/bin/ld -m elf_x86_64"
11795c11795
< compiler_lib_search_dirs="/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2 /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64 /lib/../lib64 /usr/lib/../lib64 /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../.."
---
> compiler_lib_search_dirs="/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5 /home/brendan/miniconda3/envs/test2/bin/../lib/gcc /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/../../../../lib /lib/../lib /usr/lib/../lib /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/../../.."
11799,11800c11799,11800
< predep_objects="/usr/lib/../lib64/crti.o /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/crtbeginS.o"
< postdep_objects="/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/crtendS.o /usr/lib/../lib64/crtn.o"
---
> predep_objects="/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crti.o /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crtbeginS.o"
> postdep_objects="/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crtendS.o /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crtn.o"
11806c11806
< compiler_lib_search_path="-L/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2 -L/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../.."
---
> compiler_lib_search_path="-L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5 -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/../../.."
11813c11813
< LD="/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld -m elf_x86_64"
---
> LD="/opt/rh/devtoolset-2/root/usr/bin/ld -m elf_x86_64"
11965c11965
< LD="/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld -m elf_x86_64"
---
> LD="/opt/rh/devtoolset-2/root/usr/bin/ld -m elf_x86_64"
12099c12099
< compiler_lib_search_dirs="/home/brendan/miniconda3/envs/test1/lib"
---
> compiler_lib_search_dirs="/home/brendan/miniconda3/envs/test2/lib"
12104c12104
< postdep_objects="/usr/lib/../lib64/crti.o /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/crtbeginS.o /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/crtendS.o /usr/lib/../lib64/crtn.o"
---
> postdep_objects="/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crti.o /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crtbeginS.o /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crtendS.o /home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/crtn.o"
12106c12106
< postdeps="-l -l -L/home/brendan/miniconda3/envs/test1/lib -L/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2 -L/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../.. -lgfortran -lm -lgcc_s -lquadmath -lm -lgcc_s -lc -lgcc_s"
---
> postdeps="-l -l -L/home/brendan/miniconda3/envs/test2/lib -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5 -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/home/brendan/miniconda3/envs/test2/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/../../.. -lgfortran -lm -lgcc_s -lquadmath -lm -lgcc_s -lc -lgcc_s"
12110c12110
< compiler_lib_search_path="-L/home/brendan/miniconda3/envs/test1/lib"
---
> compiler_lib_search_path="-L/home/brendan/miniconda3/envs/test2/lib"

I am a bit out of my depth here, and maybe I'm on the wrong track entirely, but I can see that the version of r-base installed with zlib=1.2.11 references directories which are not in my conda installation, and don't exist on my machine.

Here are the environments:

Test1 with zlib=1.2.11 (not working)

``` $ conda list # packages in environment at /home/brendan/miniconda3/envs/test1: # # Name Version Build Channel _r-mutex 1.0.0 anacondar_1 bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2018.11.29 ha4d7672_0 conda-forge cairo 1.14.12 h80bd089_1005 conda-forge curl 7.61.0 h93b3f91_2 conda-forge fontconfig 2.13.1 h2176d3f_1000 conda-forge freetype 2.9.1 h94bbf69_1005 conda-forge gettext 0.19.8.1 h9745a5d_1001 conda-forge glib 2.56.2 had28632_1001 conda-forge graphite2 1.3.13 hf484d3e_1000 conda-forge gsl 2.2.1 h0c605f7_3 harfbuzz 1.9.0 he243708_1001 conda-forge icu 58.2 hf484d3e_1000 conda-forge jpeg 9c h14c3975_1001 conda-forge krb5 1.14.6 0 conda-forge libffi 3.2.1 hf484d3e_1005 conda-forge libgcc-ng 7.3.0 hdf63c60_0 conda-forge libgfortran 3.0.0 1 conda-forge libiconv 1.15 h14c3975_1004 conda-forge libpng 1.6.36 h84994c4_1000 conda-forge libssh2 1.8.0 h1ad7b7a_1003 conda-forge libstdcxx-ng 7.3.0 hdf63c60_0 conda-forge libtiff 4.0.10 h648cc4a_1001 conda-forge libuuid 2.32.1 h14c3975_1000 conda-forge libxcb 1.13 h14c3975_1002 conda-forge libxml2 2.9.8 h143f9aa_1005 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge openssl 1.0.2r h14c3975_0 conda-forge pango 1.40.14 hf0c64fd_1003 conda-forge pcre 8.41 hf484d3e_1003 conda-forge pixman 0.34.0 h14c3975_1003 conda-forge pthread-stubs 0.4 h14c3975_1001 conda-forge r-base 3.4.1 h4fe35fd_8 conda-forge readline 7.0 hf8c457e_1001 conda-forge tk 8.6.9 h84994c4_1000 conda-forge xorg-kbproto 1.0.7 h14c3975_1002 conda-forge xorg-libice 1.0.9 h14c3975_1004 conda-forge xorg-libsm 1.2.3 h4937e3b_1000 conda-forge xorg-libx11 1.6.7 h14c3975_1000 conda-forge xorg-libxau 1.0.9 h14c3975_0 conda-forge xorg-libxdmcp 1.1.2 h14c3975_1007 conda-forge xorg-libxext 1.3.3 h14c3975_1004 conda-forge xorg-libxrender 0.9.10 h14c3975_1002 conda-forge xorg-libxt 1.1.5 h14c3975_1002 conda-forge xorg-renderproto 0.11.1 h14c3975_1002 conda-forge xorg-xextproto 7.3.0 h14c3975_1002 conda-forge xorg-xproto 7.0.31 h14c3975_1007 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.11 h14c3975_1004 conda-forge ```

test2 with zlib=1.2.8 (working)

``` $ conda list # packages in environment at /home/brendan/miniconda3/envs/test2: # # Name Version Build Channel _r-mutex 1.0.0 anacondar_1 bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2018.11.29 ha4d7672_0 conda-forge cairo 1.14.6 4 conda-forge curl 7.54.1 0 conda-forge fontconfig 2.12.1 6 conda-forge freetype 2.7 1 conda-forge gettext 0.19.8.1 h9745a5d_1001 conda-forge glib 2.51.4 0 conda-forge graphite2 1.3.13 hf484d3e_1000 conda-forge gsl 2.1 2 conda-forge harfbuzz 1.4.3 0 conda-forge icu 58.2 hf484d3e_1000 conda-forge jpeg 9c h14c3975_1001 conda-forge krb5 1.14.6 0 conda-forge libffi 3.2.1 hf484d3e_1005 conda-forge libgcc 7.2.0 h69d50b8_2 conda-forge libgcc-ng 7.3.0 hdf63c60_0 conda-forge libiconv 1.15 h14c3975_1004 conda-forge libpng 1.6.28 1 conda-forge libssh2 1.8.0 1 conda-forge libstdcxx-ng 7.3.0 hdf63c60_0 conda-forge libtiff 4.0.7 0 conda-forge libxml2 2.9.5 0 conda-forge ncurses 5.9 10 conda-forge openssl 1.0.2r h14c3975_0 conda-forge pango 1.40.4 0 conda-forge pcre 8.39 0 conda-forge pixman 0.34.0 h14c3975_1003 conda-forge r-base 3.4.1 0 conda-forge readline 6.2 0 conda-forge tk 8.5.19 2 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.8 3 conda-forge ```

And my conda info:

``` $ conda info active environment : test2 active env location : /home/brendan/miniconda3/envs/test2 shell level : 3 user config file : /home/brendan/.condarc populated config files : /home/brendan/.condarc conda version : 4.6.7 conda-build version : not installed python version : 3.7.1.final.0 base environment : /home/brendan/miniconda3 (writable) channel URLs : https://conda.anaconda.org/bioconda/linux-64 https://conda.anaconda.org/bioconda/noarch https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/free/linux-64 https://repo.anaconda.com/pkgs/free/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/brendan/miniconda3/pkgs /home/brendan/.conda/pkgs envs directories : /home/brendan/miniconda3/envs /home/brendan/.conda/envs platform : linux-64 user-agent : conda/4.6.7 requests/2.21.0 CPython/3.7.1 Linux/3.16.0-77-generic ubuntu/16.04.5 glibc/2.23 UID:GID : 1000:1000 netrc file : None offline mode : False ```

(edit to fix details formatting)

brendanf commented 5 years ago

I just tested with an intermediate build of r-base:

$ conda create -n test4 r-base=3.4.1=2

This version requires zlib 1.2.11, and yet ShortRead installs successfully. This confirms that zlib is not actually the problem. However, the changes I identified in libtool happened after this build.

jdblischak commented 5 years ago

@brendanf I'm not sure what is causing the issue. But to better understand your use case, what is your motivation for installing ShortRead from source instead of from the bioconda channel? The commands below install ShortRead 1.40.0, which is the current release.

conda --version
## conda 4.6.7
conda config --get channels
## --add channels 'defaults'   # lowest priority
## --add channels 'bioconda'
## --add channels 'conda-forge'   # highest priority
conda create -y -n test-shortread bioconductor-shortread wget=1.19.4
conda activate test-shortread
Rscript -e 'packageVersion("ShortRead")'
## [1] ‘1.40.0’

Note that including wget=1.19.4 is a current hack to get around a temporary bug (see https://github.com/bioconda/bioconda-recipes/issues/13846 for details).

If you need to install a bleeding edge version of ShortRead, you could modify the existing bioconda recipe for bioconductor-shortread to point to the devel version of ShortRead (1.41.0), build the recipe with conda build, and upload it to your personal Anaconda Channel.

brendanf commented 5 years ago

@jdblischak Thanks for the reply. I'm using packrat to manage my R packages, because I am using some which can only be installed from GitHub (or where I need a feature that hasn't made it to CRAN/Bioconductor yet). However, packrat can't manage the R distribution itself, or any of the other software I'm using, which is why I'm using Conda. In theory, I should be able to declare ShortRead as an external package in packrat, but I haven't been able to make this work with packages which are dependencies of the packages I need to install from GitHub. For the time being, I'll just stick to R 3.4.1b2, but it would be nice if it were possible to do this using the newest version of R.

brendanf commented 5 years ago

I just downloaded the source of ShortRead and modified configure.ac to remove the check for zlib.

```diff diff --git a/configure.ac b/configure.ac index d9111f7..c6bcd2b 100644 --- a/configure.ac +++ b/configure.ac @@ -1,6 +1,6 @@ AC_INIT("DESCRIPTION") -AC_CHECK_LIB([z], [gzeof], , AC_ERROR([zlib not found])) +#AC_CHECK_LIB([z], [gzeof], , AC_ERROR([zlib not found])) AC_CHECK_SIZEOF([unsigned long]) AC_OUTPUT(src/Makevars) ```

I then installed it in a Conda environment using the latest r-base without a problem.

```bash $ conda list # packages in environment at /home/brendan/miniconda3/envs/test1: # # Name Version Build Channel _r-mutex 1.0.0 anacondar_1 binutils_impl_linux-64 2.31.1 h6176602_1 conda-forge binutils_linux-64 2.31.1 h6176602_3 conda-forge bwidget 1.9.11 1 bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2018.11.29 ha4d7672_0 conda-forge cairo 1.14.12 h80bd089_1005 conda-forge curl 7.64.0 h646f8bb_2 conda-forge fontconfig 2.13.1 h2176d3f_1000 conda-forge freetype 2.9.1 h94bbf69_1005 conda-forge gcc_impl_linux-64 7.3.0 habb00fd_1 conda-forge gcc_linux-64 7.3.0 h553295d_3 conda-forge gettext 0.19.8.1 h9745a5d_1001 conda-forge gfortran_impl_linux-64 7.3.0 hdf63c60_1 conda-forge gfortran_linux-64 7.3.0 h553295d_3 conda-forge glib 2.56.2 had28632_1001 conda-forge graphite2 1.3.13 hf484d3e_1000 conda-forge gxx_impl_linux-64 7.3.0 hdf63c60_1 conda-forge gxx_linux-64 7.3.0 h553295d_3 conda-forge harfbuzz 1.9.0 he243708_1001 conda-forge icu 58.2 hf484d3e_1000 conda-forge jpeg 9c h14c3975_1001 conda-forge krb5 1.16.3 h05b26f9_1001 conda-forge libcurl 7.64.0 h541490c_2 conda-forge libedit 3.1.20170329 hf8c457e_1001 conda-forge libffi 3.2.1 hf484d3e_1005 conda-forge libgcc-ng 7.3.0 hdf63c60_0 conda-forge libgfortran-ng 7.3.0 hdf63c60_0 libiconv 1.15 h14c3975_1004 conda-forge libpng 1.6.36 h84994c4_1000 conda-forge libsodium 1.0.16 h14c3975_1001 conda-forge libssh2 1.8.0 h90d6eec_1004 conda-forge libstdcxx-ng 7.3.0 hdf63c60_0 conda-forge libtiff 4.0.10 h648cc4a_1001 conda-forge libuuid 2.32.1 h14c3975_1000 conda-forge libxcb 1.13 h14c3975_1002 conda-forge libxml2 2.9.8 h143f9aa_1005 conda-forge make 4.2.1 h14c3975_2004 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge openssl 1.1.1b h14c3975_0 conda-forge pango 1.40.14 hf0c64fd_1003 conda-forge pcre 8.41 hf484d3e_1003 conda-forge pixman 0.34.0 h14c3975_1003 conda-forge pthread-stubs 0.4 h14c3975_1001 conda-forge r-base 3.5.1 he45234b_1005 conda-forge readline 7.0 hf8c457e_1001 conda-forge tk 8.6.9 h84994c4_1000 conda-forge tktable 2.10 h14c3975_0 xorg-kbproto 1.0.7 h14c3975_1002 conda-forge xorg-libice 1.0.9 h14c3975_1004 conda-forge xorg-libsm 1.2.3 h4937e3b_1000 conda-forge xorg-libx11 1.6.7 h14c3975_1000 conda-forge xorg-libxau 1.0.9 h14c3975_0 conda-forge xorg-libxdmcp 1.1.2 h14c3975_1007 conda-forge xorg-libxext 1.3.3 h14c3975_1004 conda-forge xorg-libxrender 0.9.10 h14c3975_1002 conda-forge xorg-renderproto 0.11.1 h14c3975_1002 conda-forge xorg-xextproto 7.3.0 h14c3975_1002 conda-forge xorg-xproto 7.0.31 h14c3975_1007 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zeromq 4.3.1 hf484d3e_1000 conda-forge zlib 1.2.11 h14c3975_1004 conda-forge $ R > install.packages(devtools) > devtools::install_deps() > devtools::install() ```

Just for good measure, I also opened a .fastq.gz file using ShortRead::readFastq(). This was successful, so zlib was definitely linked and functional. This means that there is no problem with linking to zlib; the problem is just that configure can't find it. This definitely points to the wrong directories in libtool being the problem.

jdblischak commented 5 years ago

I'm using packrat to manage my R packages, because I am using some which can only be installed from GitHub (or where I need a feature that hasn't made it to CRAN/Bioconductor yet).

@brendanf OK. That is going to be tough to manage for some edge cases, as you are already finding out. When I want to use GitHub-only R packages in a conda environment, I create a conda recipe for it (the only requirement is that the GitHub repo has at least one tag/release), build it, and then upload it to my personal Anaconda channel. That looks something like this:

conda install conda-build
conda skeleton cran https://github.com/username/pkgname
conda build --R 3.5.1 r-pkgname
anaconda upload <path-to-tarball>

For the time being, I'll just stick to R 3.4.1b2, but it would be nice if it were possible to do this using the newest version of R.

Due to the practical constraints of time (both maintainers and CI servers) and space (for tarballs on Anaconda Cloud), conda-forge builds R packages for the first patch of each minor release of R (e.g. R 3.x.1). Each successive patched version of R only makes very minimal changes, so you shouldn't notice any difference in performance.

Just for good measure, I also opened a .fastq.gz file using ShortRead::readFastq(). This was successful, so zlib was definitely linked and functional. This means that there is no problem with linking to zlib; the problem is just that configure can't find it. This definitely points to the wrong directories in libtool being the problem.

Glad you found a workaround for your particular use case!

khughitt commented 5 years ago

Still an issue with r-base=3.6.0 and packages depending on zlib.

Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.

khughitt commented 5 years ago

On a side-note, do you think there is any possibility of creating a "bridge" between Conda and (CRAN/Bioconductor/Github, etc.)?

I imagine this has already been explored and ruled out or else issue wouldn't exist, but I'm curious as to whether this is something that is not likely to ever happen, or if it could be accomplished with sufficient resources?..

My current approach is attempt to use conda to control the version of R installed, and renv to manage R packages, however, compatibility issues like these limit the effectiveness of that approach..

jdblischak commented 5 years ago

Still an issue with r-base=3.6.0 and packages depending on zlib.

@khughitt r-base=3.6.0 is not available from the conda-forge channel. We will package 3.6.1 once it is released. That version is only available from the defaults channel.

$ conda search r-base=3.6.0
Loading channels: done
# Name                       Version           Build  Channel             
r-base                         3.6.0      hce969dd_0  pkgs/main  

Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.

Conda users can install the Bioconductor packages from the bioconda channel:

# Make sure channels are configured correctly
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# Install GenomicAlignments
conda install bioconductor-genomicalignments

I tested and confirmed that this works.

any possibility of creating a "bridge" between Conda and (CRAN/Bioconductor/Github, etc.)

What did you have in mind?

My current approach is attempt to use conda to control the version of R installed, and renv to manage R packages, however, compatibility issues like these limit the effectiveness of that approach..

As you know, managing dependencies is a hard problem. Different package managers each have their own solutions. The conda packages in the conda-forge and bioconda channels should all be interoperable. But if you install R packages (or other software) using a different method, there is no guarantee that this will work.

khughitt commented 5 years ago

@jdblischak That's fair -- I realize my points (and also the original zlib issue, I believe) are not at all conda-forge specific.

In general, I really don't like the idea that the only solution to using conda to manage an R installation is to go "all-in" and use it to manage R packages as well. It seems like it should be possible to use conda to install a specific version of R, and then let R (or devtools/BiocManager/renv/etc.) handle the R packages themselves.

With respect to a bridge, I don't really have anything specific in mind, and I'm afraid I don't know enough about conda internals to offer any meaningful insights..

One approach might be to create one or more pseudo-channels (e.g. 'r', 'bioconductor') to interface with those repositories. When a user searches for some package name with those channels enabled, conda could query those repositories behind the scene to check for relevant results. When a user then attempts to do a "conda install" for an R package, conda could then use R's install.packages() (or devtools, perhaps) to perform the actual package installation, generating the necessary by-products (e.g. conda-meta/r-<package>.json of a usual conda install along the way?

I do appreciate the difficulty of managing dependencies, and perhaps this would just be too complex to implement. Regardless, I am grateful for the efforts of all of the conda (and conda-forge and bioconda) devs and packagers, and for the significant improvements to cross-platform package management and reproducibility that these have made possible.

brendanf commented 5 years ago

@khughitt I've decided to go "all-in" with conda, as you put it. In the cases where the package/version I want is not available, I've started packaging it myself on a personal conda channel, as suggested by @jdblischak above.

jdblischak commented 5 years ago

In general, I really don't like the idea that the only solution to using conda to manage an R installation is to go "all-in" and use it to manage R packages as well. It seems like it should be possible to use conda to install a specific version of R, and then let R (or devtools/BiocManager/renv/etc.) handle the R packages themselves.

@khughitt You don't have to go "all-in" with conda or any other package manager. You can install R with conda, APT, Homebrew, etc., and then install your packages with devtools/BiocManager/renv/etc.. This will work much of the time. But if a particular package relies on system libraries (e.g. ShortRead, rJava, etc.), then it might be more difficult to install. This is where a package manager is helpful. It will ensure that you have the necessary system libraries installed and that the new software you are installing will be able to link to them.

When a user then attempts to do a "conda install" for an R package, conda could then use R's install.packages() (or devtools, perhaps) to perform the actual package installation, generating the necessary by-products (e.g. conda-meta/r-.json of a usual conda install along the way?

That exact implementation wouldn't work because conda install is only for finding and installing conda packages. More realistic would be to support manually installed R packages in the same way that manually installed Python packages are when installed via pip. Although conda won't help with these packages (i.e. a Python package with compiled code will need to successfully compile after running pip install), it does keep track of the package version (and these can be specified in an environment.yaml file). I know these features have been discussed (https://github.com/conda/conda/issues/7248#issuecomment-491049283), but they aren't currently available.

But even if manually installed R packages were given more support, they would still have the same installation problems.

I've decided to go "all-in" with conda, as you put it. In the cases where the package/version I want is not available, I've started packaging it myself on a personal conda channel, as suggested by @jdblischak above.

@brendanf That's awesome! And if you manage to create a working conda recipe for an R package released on CRAN (or Bioconductor) that isn't available on conda-forge (or bioconda), please consider submitting it.

khughitt commented 5 years ago

@jdblischak Thanks for taking the time to respond and for clarifying with regard to conda's implementation.

Based on this discussion and my experience working with other methods for capturing the R environment (really just singularity / docker, if you include the version of R itself), it seems like the best approach is probably to just do the same and start building recipes for packages that aren't already on the main channels. It might take a little bit of time, but most of the the packages are simple enough that they shouldn't be too difficult to port.

sbamin commented 5 years ago

Still an issue with r-base=3.6.0 and packages depending on zlib.

Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.

Second that and still an issue with r-base=3.6.1 and getting zlib.h not found error while installing Rhtslib. I do not see r-essential package for R 3.6.1 in conda-forge. No rush but any ETA on that?

For those looking to install zlib dependent R packages, I was able to install Rhtslib by editing Makefiles and specifying CPPFLAGS and LDFLAGS. via https://github.com/Bioconductor/Rhtslib/issues/9#issuecomment-507057176

cd ~/Downloads && \
wget wget https://bioconductor.org/packages/release/bioc/src/contrib/Rhtslib_1.16.1.tar.gz && \
tar xvzf Rhtslib_1.16.1.tar.gz && \
cd Rhtslib/src/htslib-1.7
CPPFLAGS = -I/home/foo/anaconda3/include
LDFLAGS  = -L/home/foo/anaconda3/lib
cd ~/Downloads/Rhtslib && \
R CMD INSTALL .

Done!


conda info
     active environment : base
    active env location : /home/foo/anaconda3
            shell level : 1
          conda version : 4.7.5
    conda-build version : 3.17.8
         python version : 3.7.3.final.0
       virtual packages :
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
               platform : linux-64
             user-agent : conda/4.7.5 requests/2.21.0 CPython/3.7.3 Linux/2.6.32-696.18.7.el6.x86_64 centos/6.5 glibc/2.12
sessionInfo()
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS/LAPACK: /home/foo/anaconda3/lib/libopenblasp-r0.3.6.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.6.1
dpryan79 commented 5 years ago

We (bioconda) are likely to start building the bioconductor packages for R 3.6 early this week.

brendanf commented 5 years ago

But if a particular package relies on system libraries (e.g. ShortRead, rJava, etc.), then it might be more difficult to install. This is where a package manager is helpful. It will ensure that you have the necessary system libraries installed and that the new software you are installing will be able to link to them.

I'm curious why there is no interest in ensuring that R installed from conda is properly configured to install compiled packages, including finding system libraries in their standard (according to conda) locations. It's one of the core functions of the software, and it would be nice if the software packaged by conda-forge was functional.

In the original post for this issue, I pointed out that variables like sys_lib_search_path_spec, compiler_lib_search_dirs, predep_objects, and postdep_objects in lib/R/bin/libtool/ from conda-forge point to directories that are not in the conda environment and do not exist on my system; presumably they are from the system that built the package. This is a reversion compared to previous versions, where they pointed to the correct directories inside the conda environment, and installing ShortRead from within R works. Why isn't that a bug that should be fixed? Is this something that was done intentionally to streamline the build process for R packages on conda-forge, or is it something that just slipped in? Did it come from the official r/r-base channel? As I said before, I am at the limits of my own competence here, so I can't contribute much more to solving this particular problem. Maybe I am completely off-track.

Of course I understand that, if I try to install R packages directly, I would have to be sure to install the proper system libraries also; ideally from conda if they exist as a package. But, once I've installed the system libraries, I'd like the proper paths to be set to USE them when linking compiled software. Again, that is their function.

As I said in a previous comment, this issue has led me to start managing all my R packages through conda, building private conda packages as necessary. This works, but it is much slower and more painful than installing them from within R using, e.g., devtools. It is especially a pain when I find a bug in an R package I am using, submit a patch, and then have to build a private conda package in order to get the fix into my project, since the version number has not been bumped and there is no new 'official' package. R has functionality, through devtools, to quickly update a package from github. It also has integrated functionality, through packrat(https://github.com/conda-forge/r-packrat-feedstock), to cache the source code for all packages in use so that they can be reproducibly installed on another machine. These are packaged on conda-forge. It would be nice to be able to use that functionality -- even when one of the packages needs to link to a completely standard system library which is also required by R itself.

As @khughitt said above, I do appreciate all the work that goes in to maintaining conda-forge, and I am grateful for that -- after all, when I realized that I had to choose between conda and packrat, I chose conda.

khughitt commented 5 years ago

@brendanf Off-topics, but you may also wish to check out renv -- it's meant to be a replacement for Packrat developed by some of the rstudio folks, and has been pretty nice to work with so far.

mingwandroid commented 5 years ago

I'm curious why there is no interest in ensuring that R installed from conda is properly configured to install compiled packages, including finding system libraries in their standard (according to conda) locations. It's one of the core functions of the software, and it would be nice if the software packaged by conda-forge was functional.

Because it's not really possible? I could go into the technical details in excruciating detail if you wish. It involves intimate knowledge of how the completely different OSes load shared libraries and more importantly, how they decided not to.

brendanf commented 5 years ago

@mingwandroid Ok, then it is at least clear to me that something along the lines of "something that was done intentionally to streamline the build process for R packages" is the case. Thanks for the reply.

@khughitt Thanks, I'll check it out.

mingwandroid commented 5 years ago

For my own clarification, could you be explicit in what you mean by system libraries? Do you mean exactly the shared library files in /usr/lib{64} Thing is everyone has their own definition of that round here and it makes communication difficult (and the R people have their own formalisation of that too).

The only ways I can think of to make this work (I'd like to believe me!) would cause more trouble than they'd solve. It'd involve redirecting /usr to your conda env inside our compilers

brendanf commented 5 years ago

@mingwandroid Sorry for the confusion. When I said "system libraries" in my previous post, I was using a very R-centric definition: "any shared library not packaged inside an R package". I understand that it's not reasonable for conda to manage any shared libraries that are not installed via conda; I just want to be able to compile against the libraries which are present in the conda environment. In this particular case, after installing r-base and its dependencies (which include gcc, make, zlib, etc.) from conda, then I would like for R to be able to find zlib in the conda environment and link to it when compiling a package. Right now, that isn't the case, as this line in configure.ac from ShortRead fails:

AC_CHECK_LIB([z], [gzeof], , AC_ERROR([zlib not found]))
jdblischak commented 5 years ago

I just want to be able to compile against the libraries which are present in the conda environment. In this particular case, after installing r-base and its dependencies (which include gcc, make, zlib, etc.) from conda, then I would like for R to be able to find zlib in the conda environment and link to it when compiling a package.

@brendanf I agree this is a reasonable use case. IIRC this used to be possible, as you also recall. I am pretty sure the change occurred during the Migration to conda-build 3 and new compilers from Anaconda. This solved a lot of problems, but apparently this was a negative side effect.

And I also trust @mingwandroid when he says this isn't easily possible.

And for future testing, I tested the current behavior below:

docker run -it --rm condaforge/linux-anvil
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install -y bioconductor-shortread r-biocmanager
conda --version
## conda 4.7.8
Rscript -e 'BiocManager::install("ShortRead")'
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.1 (2018-07-02)
Installing package(s) 'BiocVersion', 'ShortRead'
trying URL 'https://bioconductor.org/packages/3.8/bioc/src/contrib/BiocVersion_3.8.0.tar.gz'
Content type 'application/x-gzip' length 994 bytes
==================================================
downloaded 994 bytes

trying URL 'https://bioconductor.org/packages/3.8/bioc/src/contrib/ShortRead_1.40.0.tar.gz'
Content type 'application/x-gzip' length 5183477 bytes (4.9 MB)
==================================================
downloaded 4.9 MB

* installing *source* package ‘BiocVersion’ ...
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (BiocVersion)
* installing *source* package ‘ShortRead’ ...
checking for gcc... /opt/conda/bin/x86_64-conda_cos6-linux-gnu-cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /opt/conda/bin/x86_64-conda_cos6-linux-gnu-cc accepts -g... yes
checking for /opt/conda/bin/x86_64-conda_cos6-linux-gnu-cc option to accept ISO C89... none needed
checking for gzeof in -lz... no
configure: error: zlib not found
ERROR: configuration failed for package ‘ShortRead’
* removing ‘/opt/conda/lib/R/library/ShortRead’
* restoring previous ‘/opt/conda/lib/R/library/ShortRead’

The downloaded source packages are in
    ‘/tmp/RtmpOyfDcf/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Update old packages: 'GenomeInfoDb', 'Rsamtools'
Warning message:
In install.packages(pkgs = doing, lib = lib, repos = repos, ...) :
  installation of package ‘ShortRead’ had non-zero exit status
shengyongniu commented 5 years ago

Still an issue with r-base=3.6.0 and packages depending on zlib. Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.

Second that and still an issue with r-base=3.6.1 and getting zlib.h not found error while installing Rhtslib. I do not see r-essential package for R 3.6.1 in conda-forge. No rush but any ETA on that?

For those looking to install zlib dependent R packages, I was able to install Rhtslib by editing Makefiles and specifying CPPFLAGS and LDFLAGS. via Bioconductor/Rhtslib#9 (comment)

cd ~/Downloads && \
wget wget https://bioconductor.org/packages/release/bioc/src/contrib/Rhtslib_1.16.1.tar.gz && \
tar xvzf Rhtslib_1.16.1.tar.gz && \
cd Rhtslib/src/htslib-1.7
  • Repalce CPPFLAGS and LDFLAGS lines with these ones in two files: Makefile and Makefile.Rhtslib
CPPFLAGS = -I/home/foo/anaconda3/include
LDFLAGS  = -L/home/foo/anaconda3/lib
  • Now install Rhtslib.
cd ~/Downloads/Rhtslib && \
R CMD INSTALL .

Done!

conda info
     active environment : base
    active env location : /home/foo/anaconda3
            shell level : 1
          conda version : 4.7.5
    conda-build version : 3.17.8
         python version : 3.7.3.final.0
       virtual packages :
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
               platform : linux-64
             user-agent : conda/4.7.5 requests/2.21.0 CPython/3.7.3 Linux/2.6.32-696.18.7.el6.x86_64 centos/6.5 glibc/2.12
sessionInfo()
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS/LAPACK: /home/foo/anaconda3/lib/libopenblasp-r0.3.6.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.6.1

You save my life!