Open brendanf opened 5 years ago
I just tested with an intermediate build of r-base
:
$ conda create -n test4 r-base=3.4.1=2
This version requires zlib 1.2.11
, and yet ShortRead
installs successfully. This confirms that zlib
is not actually the problem. However, the changes I identified in libtool
happened after this build.
@brendanf I'm not sure what is causing the issue. But to better understand your use case, what is your motivation for installing ShortRead from source instead of from the bioconda channel? The commands below install ShortRead 1.40.0, which is the current release.
conda --version
## conda 4.6.7
conda config --get channels
## --add channels 'defaults' # lowest priority
## --add channels 'bioconda'
## --add channels 'conda-forge' # highest priority
conda create -y -n test-shortread bioconductor-shortread wget=1.19.4
conda activate test-shortread
Rscript -e 'packageVersion("ShortRead")'
## [1] ‘1.40.0’
Note that including wget=1.19.4
is a current hack to get around a temporary bug (see https://github.com/bioconda/bioconda-recipes/issues/13846 for details).
If you need to install a bleeding edge version of ShortRead, you could modify the existing bioconda recipe for bioconductor-shortread to point to the devel version of ShortRead (1.41.0), build the recipe with conda build
, and upload it to your personal Anaconda Channel.
@jdblischak Thanks for the reply. I'm using packrat to manage my R packages, because I am using some which can only be installed from GitHub (or where I need a feature that hasn't made it to CRAN/Bioconductor yet). However, packrat
can't manage the R distribution itself, or any of the other software I'm using, which is why I'm using Conda. In theory, I should be able to declare ShortRead
as an external package in packrat
, but I haven't been able to make this work with packages which are dependencies of the packages I need to install from GitHub. For the time being, I'll just stick to R 3.4.1b2, but it would be nice if it were possible to do this using the newest version of R.
I just downloaded the source of ShortRead
and modified configure.ac
to remove the check for zlib
.
I then installed it in a Conda environment using the latest r-base
without a problem.
Just for good measure, I also opened a .fastq.gz
file using ShortRead::readFastq()
. This was successful, so zlib
was definitely linked and functional. This means that there is no problem with linking to zlib
; the problem is just that configure
can't find it. This definitely points to the wrong directories in libtool
being the problem.
I'm using packrat to manage my R packages, because I am using some which can only be installed from GitHub (or where I need a feature that hasn't made it to CRAN/Bioconductor yet).
@brendanf OK. That is going to be tough to manage for some edge cases, as you are already finding out. When I want to use GitHub-only R packages in a conda environment, I create a conda recipe for it (the only requirement is that the GitHub repo has at least one tag/release), build it, and then upload it to my personal Anaconda channel. That looks something like this:
conda install conda-build
conda skeleton cran https://github.com/username/pkgname
conda build --R 3.5.1 r-pkgname
anaconda upload <path-to-tarball>
For the time being, I'll just stick to R 3.4.1b2, but it would be nice if it were possible to do this using the newest version of R.
Due to the practical constraints of time (both maintainers and CI servers) and space (for tarballs on Anaconda Cloud), conda-forge builds R packages for the first patch of each minor release of R (e.g. R 3.x.1). Each successive patched version of R only makes very minimal changes, so you shouldn't notice any difference in performance.
Just for good measure, I also opened a .fastq.gz file using ShortRead::readFastq(). This was successful, so zlib was definitely linked and functional. This means that there is no problem with linking to zlib; the problem is just that configure can't find it. This definitely points to the wrong directories in libtool being the problem.
Glad you found a workaround for your particular use case!
Still an issue with r-base=3.6.0 and packages depending on zlib.
Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.
On a side-note, do you think there is any possibility of creating a "bridge" between Conda and (CRAN/Bioconductor/Github, etc.)?
I imagine this has already been explored and ruled out or else issue wouldn't exist, but I'm curious as to whether this is something that is not likely to ever happen, or if it could be accomplished with sufficient resources?..
My current approach is attempt to use conda to control the version of R installed, and renv to manage R packages, however, compatibility issues like these limit the effectiveness of that approach..
Still an issue with r-base=3.6.0 and packages depending on zlib.
@khughitt r-base=3.6.0
is not available from the conda-forge channel. We will package 3.6.1
once it is released. That version is only available from the defaults channel.
$ conda search r-base=3.6.0
Loading channels: done
# Name Version Build Channel
r-base 3.6.0 hce969dd_0 pkgs/main
Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.
Conda users can install the Bioconductor packages from the bioconda channel:
# Make sure channels are configured correctly
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# Install GenomicAlignments
conda install bioconductor-genomicalignments
I tested and confirmed that this works.
any possibility of creating a "bridge" between Conda and (CRAN/Bioconductor/Github, etc.)
What did you have in mind?
My current approach is attempt to use conda to control the version of R installed, and renv to manage R packages, however, compatibility issues like these limit the effectiveness of that approach..
As you know, managing dependencies is a hard problem. Different package managers each have their own solutions. The conda packages in the conda-forge and bioconda channels should all be interoperable. But if you install R packages (or other software) using a different method, there is no guarantee that this will work.
@jdblischak That's fair -- I realize my points (and also the original zlib issue, I believe) are not at all conda-forge specific.
In general, I really don't like the idea that the only solution to using conda to manage an R installation is to go "all-in" and use it to manage R packages as well. It seems like it should be possible to use conda to install a specific version of R, and then let R (or devtools/BiocManager/renv/etc.) handle the R packages themselves.
With respect to a bridge, I don't really have anything specific in mind, and I'm afraid I don't know enough about conda internals to offer any meaningful insights..
One approach might be to create one or more pseudo-channels (e.g. 'r', 'bioconductor') to interface with those repositories. When a user searches for some package name with those channels enabled, conda could query those repositories behind the scene to check for relevant results. When a user then attempts to do a "conda install" for an R package, conda could then use R's install.packages()
(or devtools, perhaps) to perform the actual package installation, generating the necessary by-products (e.g. conda-meta/r-<package>.json
of a usual conda install along the way?
I do appreciate the difficulty of managing dependencies, and perhaps this would just be too complex to implement. Regardless, I am grateful for the efforts of all of the conda (and conda-forge and bioconda) devs and packagers, and for the significant improvements to cross-platform package management and reproducibility that these have made possible.
@khughitt I've decided to go "all-in" with conda, as you put it. In the cases where the package/version I want is not available, I've started packaging it myself on a personal conda channel, as suggested by @jdblischak above.
In general, I really don't like the idea that the only solution to using conda to manage an R installation is to go "all-in" and use it to manage R packages as well. It seems like it should be possible to use conda to install a specific version of R, and then let R (or devtools/BiocManager/renv/etc.) handle the R packages themselves.
@khughitt You don't have to go "all-in" with conda or any other package manager. You can install R with conda, APT, Homebrew, etc., and then install your packages with devtools/BiocManager/renv/etc.. This will work much of the time. But if a particular package relies on system libraries (e.g. ShortRead, rJava, etc.), then it might be more difficult to install. This is where a package manager is helpful. It will ensure that you have the necessary system libraries installed and that the new software you are installing will be able to link to them.
When a user then attempts to do a "conda install" for an R package, conda could then use R's install.packages() (or devtools, perhaps) to perform the actual package installation, generating the necessary by-products (e.g. conda-meta/r-
.json of a usual conda install along the way?
That exact implementation wouldn't work because conda install
is only for finding and installing conda packages. More realistic would be to support manually installed R packages in the same way that manually installed Python packages are when installed via pip
. Although conda won't help with these packages (i.e. a Python package with compiled code will need to successfully compile after running pip install
), it does keep track of the package version (and these can be specified in an environment.yaml
file). I know these features have been discussed (https://github.com/conda/conda/issues/7248#issuecomment-491049283), but they aren't currently available.
But even if manually installed R packages were given more support, they would still have the same installation problems.
I've decided to go "all-in" with conda, as you put it. In the cases where the package/version I want is not available, I've started packaging it myself on a personal conda channel, as suggested by @jdblischak above.
@brendanf That's awesome! And if you manage to create a working conda recipe for an R package released on CRAN (or Bioconductor) that isn't available on conda-forge (or bioconda), please consider submitting it.
@jdblischak Thanks for taking the time to respond and for clarifying with regard to conda's implementation.
Based on this discussion and my experience working with other methods for capturing the R environment (really just singularity / docker, if you include the version of R itself), it seems like the best approach is probably to just do the same and start building recipes for packages that aren't already on the main channels. It might take a little bit of time, but most of the the packages are simple enough that they shouldn't be too difficult to port.
Still an issue with r-base=3.6.0 and packages depending on zlib.
Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.
Second that and still an issue with r-base=3.6.1 and getting zlib.h not found error while installing Rhtslib. I do not see r-essential package for R 3.6.1 in conda-forge. No rush but any ETA on that?
For those looking to install zlib dependent R packages, I was able to install Rhtslib by editing Makefiles and specifying CPPFLAGS and LDFLAGS. via https://github.com/Bioconductor/Rhtslib/issues/9#issuecomment-507057176
cd ~/Downloads && \
wget wget https://bioconductor.org/packages/release/bioc/src/contrib/Rhtslib_1.16.1.tar.gz && \
tar xvzf Rhtslib_1.16.1.tar.gz && \
cd Rhtslib/src/htslib-1.7
Makefile
and Makefile.Rhtslib
CPPFLAGS = -I/home/foo/anaconda3/include
LDFLAGS = -L/home/foo/anaconda3/lib
cd ~/Downloads/Rhtslib && \
R CMD INSTALL .
Done!
conda info
active environment : base
active env location : /home/foo/anaconda3
shell level : 1
conda version : 4.7.5
conda-build version : 3.17.8
python version : 3.7.3.final.0
virtual packages :
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/bioconda/linux-64
https://conda.anaconda.org/bioconda/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
platform : linux-64
user-agent : conda/4.7.5 requests/2.21.0 CPython/3.7.3 Linux/2.6.32-696.18.7.el6.x86_64 centos/6.5 glibc/2.12
sessionInfo()
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)
Matrix products: default
BLAS/LAPACK: /home/foo/anaconda3/lib/libopenblasp-r0.3.6.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.1
We (bioconda) are likely to start building the bioconductor packages for R 3.6 early this week.
But if a particular package relies on system libraries (e.g. ShortRead, rJava, etc.), then it might be more difficult to install. This is where a package manager is helpful. It will ensure that you have the necessary system libraries installed and that the new software you are installing will be able to link to them.
I'm curious why there is no interest in ensuring that R installed from conda is properly configured to install compiled packages, including finding system libraries in their standard (according to conda) locations. It's one of the core functions of the software, and it would be nice if the software packaged by conda-forge was functional.
In the original post for this issue, I pointed out that variables like sys_lib_search_path_spec
, compiler_lib_search_dirs
, predep_objects
, and postdep_objects
in lib/R/bin/libtool/
from conda-forge point to directories that are not in the conda environment and do not exist on my system; presumably they are from the system that built the package. This is a reversion compared to previous versions, where they pointed to the correct directories inside the conda environment, and installing ShortRead
from within R works. Why isn't that a bug that should be fixed? Is this something that was done intentionally to streamline the build process for R packages on conda-forge, or is it something that just slipped in? Did it come from the official r/r-base
channel? As I said before, I am at the limits of my own competence here, so I can't contribute much more to solving this particular problem. Maybe I am completely off-track.
Of course I understand that, if I try to install R packages directly, I would have to be sure to install the proper system libraries also; ideally from conda
if they exist as a package. But, once I've installed the system libraries, I'd like the proper paths to be set to USE them when linking compiled software. Again, that is their function.
As I said in a previous comment, this issue has led me to start managing all my R packages through conda, building private conda packages as necessary. This works, but it is much slower and more painful than installing them from within R using, e.g., devtools
. It is especially a pain when I find a bug in an R package I am using, submit a patch, and then have to build a private conda package in order to get the fix into my project, since the version number has not been bumped and there is no new 'official' package. R has functionality, through devtools
, to quickly update a package from github. It also has integrated functionality, through packrat
(https://github.com/conda-forge/r-packrat-feedstock), to cache the source code for all packages in use so that they can be reproducibly installed on another machine. These are packaged on conda-forge. It would be nice to be able to use that functionality -- even when one of the packages needs to link to a completely standard system library which is also required by R itself.
As @khughitt said above, I do appreciate all the work that goes in to maintaining conda-forge, and I am grateful for that -- after all, when I realized that I had to choose between conda
and packrat
, I chose conda
.
@brendanf Off-topics, but you may also wish to check out renv -- it's meant to be a replacement for Packrat developed by some of the rstudio folks, and has been pretty nice to work with so far.
I'm curious why there is no interest in ensuring that R installed from conda is properly configured to install compiled packages, including finding system libraries in their standard (according to conda) locations. It's one of the core functions of the software, and it would be nice if the software packaged by conda-forge was functional.
Because it's not really possible? I could go into the technical details in excruciating detail if you wish. It involves intimate knowledge of how the completely different OSes load shared libraries and more importantly, how they decided not to.
@mingwandroid Ok, then it is at least clear to me that something along the lines of "something that was done intentionally to streamline the build process for R packages" is the case. Thanks for the reply.
@khughitt Thanks, I'll check it out.
For my own clarification, could you be explicit in what you mean by system libraries? Do you mean exactly the shared library files in /usr/lib{64} Thing is everyone has their own definition of that round here and it makes communication difficult (and the R people have their own formalisation of that too).
The only ways I can think of to make this work (I'd like to believe me!) would cause more trouble than they'd solve. It'd involve redirecting /usr to your conda env inside our compilers
@mingwandroid Sorry for the confusion. When I said "system libraries" in my previous post, I was using a very R-centric definition: "any shared library not packaged inside an R package". I understand that it's not reasonable for conda
to manage any shared libraries that are not installed via conda
; I just want to be able to compile against the libraries which are present in the conda
environment. In this particular case, after installing r-base
and its dependencies (which include gcc
, make
, zlib
, etc.) from conda
, then I would like for R
to be able to find zlib
in the conda environment and link to it when compiling a package. Right now, that isn't the case, as this line in configure.ac
from ShortRead
fails:
AC_CHECK_LIB([z], [gzeof], , AC_ERROR([zlib not found]))
I just want to be able to compile against the libraries which are present in the conda environment. In this particular case, after installing r-base and its dependencies (which include gcc, make, zlib, etc.) from conda, then I would like for R to be able to find zlib in the conda environment and link to it when compiling a package.
@brendanf I agree this is a reasonable use case. IIRC this used to be possible, as you also recall. I am pretty sure the change occurred during the Migration to conda-build 3 and new compilers from Anaconda. This solved a lot of problems, but apparently this was a negative side effect.
And I also trust @mingwandroid when he says this isn't easily possible.
And for future testing, I tested the current behavior below:
docker run -it --rm condaforge/linux-anvil
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install -y bioconductor-shortread r-biocmanager
conda --version
## conda 4.7.8
Rscript -e 'BiocManager::install("ShortRead")'
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.1 (2018-07-02)
Installing package(s) 'BiocVersion', 'ShortRead'
trying URL 'https://bioconductor.org/packages/3.8/bioc/src/contrib/BiocVersion_3.8.0.tar.gz'
Content type 'application/x-gzip' length 994 bytes
==================================================
downloaded 994 bytes
trying URL 'https://bioconductor.org/packages/3.8/bioc/src/contrib/ShortRead_1.40.0.tar.gz'
Content type 'application/x-gzip' length 5183477 bytes (4.9 MB)
==================================================
downloaded 4.9 MB
* installing *source* package ‘BiocVersion’ ...
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (BiocVersion)
* installing *source* package ‘ShortRead’ ...
checking for gcc... /opt/conda/bin/x86_64-conda_cos6-linux-gnu-cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /opt/conda/bin/x86_64-conda_cos6-linux-gnu-cc accepts -g... yes
checking for /opt/conda/bin/x86_64-conda_cos6-linux-gnu-cc option to accept ISO C89... none needed
checking for gzeof in -lz... no
configure: error: zlib not found
ERROR: configuration failed for package ‘ShortRead’
* removing ‘/opt/conda/lib/R/library/ShortRead’
* restoring previous ‘/opt/conda/lib/R/library/ShortRead’
The downloaded source packages are in
‘/tmp/RtmpOyfDcf/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Update old packages: 'GenomeInfoDb', 'Rsamtools'
Warning message:
In install.packages(pkgs = doing, lib = lib, repos = repos, ...) :
installation of package ‘ShortRead’ had non-zero exit status
Still an issue with r-base=3.6.0 and packages depending on zlib. Since many Bioconductor packages are likely to depend on some package interfacing with zlib, this is likely to affect many users. For example, GenomicAlignments is #20 in the Bioconductor package download rankings, and depends on Rhtslib, which fails to install because of zlib.
Second that and still an issue with r-base=3.6.1 and getting zlib.h not found error while installing Rhtslib. I do not see r-essential package for R 3.6.1 in conda-forge. No rush but any ETA on that?
For those looking to install zlib dependent R packages, I was able to install Rhtslib by editing Makefiles and specifying CPPFLAGS and LDFLAGS. via Bioconductor/Rhtslib#9 (comment)
cd ~/Downloads && \ wget wget https://bioconductor.org/packages/release/bioc/src/contrib/Rhtslib_1.16.1.tar.gz && \ tar xvzf Rhtslib_1.16.1.tar.gz && \ cd Rhtslib/src/htslib-1.7
- Repalce CPPFLAGS and LDFLAGS lines with these ones in two files:
Makefile
andMakefile.Rhtslib
CPPFLAGS = -I/home/foo/anaconda3/include LDFLAGS = -L/home/foo/anaconda3/lib
- Now install Rhtslib.
cd ~/Downloads/Rhtslib && \ R CMD INSTALL .
Done!
conda info
active environment : base active env location : /home/foo/anaconda3 shell level : 1 conda version : 4.7.5 conda-build version : 3.17.8 python version : 3.7.3.final.0 virtual packages : channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://conda.anaconda.org/bioconda/linux-64 https://conda.anaconda.org/bioconda/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch platform : linux-64 user-agent : conda/4.7.5 requests/2.21.0 CPython/3.7.3 Linux/2.6.32-696.18.7.el6.x86_64 centos/6.5 glibc/2.12
sessionInfo()
> sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-conda_cos6-linux-gnu (64-bit) Running under: CentOS release 6.5 (Final) Matrix products: default BLAS/LAPACK: /home/foo/anaconda3/lib/libopenblasp-r0.3.6.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.1
You save my life!
Issue: Installing the Bioconductor package ShortRead from inside R (not from conda) fails with the error:
This has been previously reported here, here and here, without any truly satisfactory resolution, except for the observation that the difference was introduced with
zlib 1.2.11
.I tried this:
but none of the results seemed to explain any possible difference.
I then tried
This yielded, unsurprisingly, a LOT of results. Using GUI diff tools and grep guesswork, I finally found this:
I am a bit out of my depth here, and maybe I'm on the wrong track entirely, but I can see that the version of
r-base
installed withzlib=1.2.11
references directories which are not in my conda installation, and don't exist on my machine.Here are the environments:
Test1 with
zlib=1.2.11
(not working)test2
withzlib=1.2.8
(working)And my conda info:
(edit to fix details formatting)