bioconda / bioconda-recipes

Conda recipes for the bioconda channel.
https://bioconda.github.io
MIT License
1.61k stars 3.2k forks source link

openssl 1.1.1 -> openssl 3 change in conda-forge causing broken installations? #39356

Open lkeegan opened 1 year ago

lkeegan commented 1 year ago

I think this is causing some strange issues for me with dependency resolution:

If I do mamba install "python=3.9" pbiotools, the packages to be installed include

  + openssl                    3.0.8  h0b41bf4_0           conda-forge/linux-64     Cached
  + pysam                      0.9.1  py39h838ef2a_12      bioconda/linux-64        Cached
  + python                    3.9.16  h2782a2a_0_cpython   conda-forge/linux-64     Cached

Apparently this old pysam package is considered compatible with openssl 3 when dependencies are resolved (I think because it depends on htslib 1.3.1 1 which doesn't include openssl as a dependency), but this installation is broken:

Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lkeegan/mambaforge/envs/ssl_versions/lib/python3.9/site-packages/pysam/__init__.py", line 5, in <module>
    from pysam.libchtslib import *
ImportError: libcrypto.so.1.1: cannot open shared object file: No such file or directory

If I also pin openssl, mamba install "python=3.9" pbiotools "openssl=1.1.1", then I get a working install with a much more recent version of pysam:

  + openssl                   1.1.1t  h0b41bf4_0           conda-forge/linux-64     Cached
  + python                    3.9.15  h47a2c10_0_cpython   conda-forge/linux-64     Cached
  + pysam                     0.20.0  py39h9abd093_0       bioconda/linux-64        Cached

Would it make sense to temporarily add openssl ==1.1.1 as a dependency to the pbiotools package as a workaround for this issue?

And are there any plans to follow conda-forge and migrate to using openssl 3 on bioconda?

Thanks!

jmarshall commented 1 year ago

At present, bioconda_utils pins conda-forge-pinning=2022.08.25.15.20.42.

Conda-forge moved to openssl 3 in conda-forge/conda-forge-pinning-feedstock#3892, merged on January 25th. So to move to rebuilding OpenSSL-using bioconda packages similarly, we will need to update bioconda_utils to pin a version of conda-forge-pinning past 2023.01.25.21.47.14 or so, and probably build updates on the bulk branch.

lkeegan commented 1 year ago

Thanks for the explanation, that makes sense.

In the meantime I guess the simplest fix for the broken installation issue would just be to require pysam >0.9.1 in pbiotools to avoid this particular combination (with this additional requirement I tried a bunch of installs with openssl3 and didn't find any more similar issues)

jdblischak commented 1 year ago

Is there anything non-core maintainers can assist with migrating to openssl 3? This openssl discrepancy is also causing headaches for recipes I co-maintain that depend on htslib

johanneskoester commented 1 year ago

I think the main source of the problems are old packages that have incomplete pinnings of their dependencies (in particular openssl dependencies). The proper way to fix them is to patch their metadata. This works via so-called repodata patching. The good thing: you can help with that.

Since our internal hackathon this tuesday, the process is now nicely described here: http://bioconda.github.io/developer/repodata_patching.html Hence, you just need to follow that guideline and introduce a small piece of fixing code for each old package that contains an unconstrained openssl dependency.

jdblischak commented 1 year ago

Since our internal hackathon this tuesday, the process is now nicely described here: http://bioconda.github.io/developer/repodata_patching.html

This is amazing documentation! I understand the repodata patching process much better now

Hence, you just need to follow that guideline and introduce a small piece of fixing code for each old package that contains an unconstrained openssl dependency.

However, I must admit that I am having trouble seeing how this will fix the openssl problem. bioconda recipes that depend on openssl have only been built against openssl 1. conda-forge has migrated to only building for openssl 3. Thus newer builds of conda-forge binaries are built against openssl 3 (eg Python 3.10.9 from https://github.com/samtools/htslib/issues/1578). Thus if you try to install Python 3.10.9 from conda-forge and htslib from bioconda on a linux machine, it will install a super old htslib that didn't yet depend on openssl. The workaround is to install the most recent Python available from conda-forge that is still built against openssl 1. Here's a demo:

# installs super-old htslib that doesn't depend on openssl
mamba create --dry-run -n test-openssl \
  --override-channels -c conda-forge -c bioconda \
  htslib python=3.10.9
##   + htslib                    1.10  h244ad75_0          bioconda/linux-64           2MB
##   + openssl                  3.0.8  h0b41bf4_0          conda-forge/linux-64     Cached
##   + python                  3.10.9  he550d4f_0_cpython  conda-forge/linux-64     Cached

# workaround: install slightly older Python built against openssl 1
mamba create --dry-run -n test-openssl \
  --override-channels -c conda-forge -c bioconda \
  htslib python=3.10.8
##   + htslib                    1.16  h6bc39ce_0          bioconda/linux-64           2MB
##   + openssl                 1.1.1t  h0b41bf4_0          conda-forge/linux-64        2MB
##   + python                  3.10.8  h257c98d_0_cpython  conda-forge/linux-64       23MB

The repodata for the most recent version of htslib for linux-64 is below:

    "htslib-1.16-h6bc39ce_0.tar.bz2": {
      "build": "h6bc39ce_0",
      "build_number": 0,
      "depends": [
        "bzip2 >=1.0.8,<2.0a0",
        "libcurl >=7.83.1,<8.0a0",
        "libdeflate >=1.13,<1.14.0a0",
        "libgcc-ng >=12",
        "libzlib >=1.2.12,<1.3.0a0",
        "openssl >=1.1.1q,<1.1.2a",
        "xz >=5.2.6,<5.3.0a0",
        "zlib >=1.2.12,<1.3.0a0"
      ],
      "license": "MIT",
      "md5": "503d5675802731748f348164d978c6de",
      "name": "htslib",
      "sha256": "a2ac96fa30ff9153eb42deb46d10c32022d314eef94a5f64e65ba9928708f53a",
      "size": 2411073,
      "subdir": "linux-64",
      "timestamp": 1663255789705,
      "version": "1.16"
    },

The openssl depedency is constrained to "openssl >=1.1.1q,<1.1.2a", because it was built against openssl 1.1.1. We could patch this dependency to the more relaxed "openssl >=1.1.1q,<4.0a0",. This would enable the conda solver to install recent versions of htslib in the same conda env as openssl 3 and the most recently updated conda-forge binaries. But this seems dangerous. I don't know the details of the differences between openssl 1 and 3, and it might be that htslib only access symbols that remained constant between openssl 1 and 3, but at minimum this should be tested first.

A related problem is that conda is set by default to aggressively update openssl for security reasons. Thus as the original post in this thread demonstrates, conda can find a better solution by being explicitly told to pin openssl to 1.1.1.

$ conda config --show aggressive_update_packages
aggressive_update_packages:
  - ca-certificates
  - certifi
  - openssl

For long-term compatibility between conda-forge and bioconda, I don't see any solution other than to start building bioconda recipes against a more recent version of conda-forge-pinning. In the short term, individual conda-forge recipes can be built for both openssl 1 and 3 by manually specifying this in recipe/conda_build_config.yaml. That way they can still be co-installed with bioconda binaries built against openssl 1.1.1.

jdblischak commented 1 year ago

By the way, for anyone currently struggling to install htslib, we have built htslib 1.16 against both openssl 1 and 3 and multiple versions of libdeflate. You can install it from the tiledb channel with mamba install -c tiledb htslib

https://anaconda.org/tiledb/htslib/files https://github.com/TileDB-Inc/htslib-feedstock https://github.com/TileDB-Inc/htslib-feedstock/blob/main/recipe/conda_build_config.yaml

dpryan79 commented 1 year ago

Correct, the repodata patching will only prevent old packages from incorrectly being used in the environment solution, there's nothing that can make existing packages compatible aside from recompiling them. We'll need a migration for that, which I guess we need to do sooner rather than later.

jmarshall commented 1 year ago

The openssl depedency is constrained to "openssl >=1.1.1q,<1.1.2a", because it was built against openssl 1.1.1. We could patch this dependency to the more relaxed "openssl >=1.1.1q,<4.0a0",. This would enable the conda solver to install recent versions of htslib in the same conda env as openssl 3 and the most recently updated conda-forge binaries. But this seems dangerous.

This would not work at all, because the soversions are different: openssl 1.1.x provides lib/libcrypto.so.1.1 while 3.0.x provides lib/libcrypto.so.3. So the parts of the htslib package that are linked against libcrypto will fail to execute at all, because the shared library they are looking for is not present.

(This category of problems occurs largely because conda does not track soversion dependencies.)

However because bioconda's htslib package has been built with plugins enabled, if you were to lie about the constraints as suggested then htslib and programs that use it would in fact likely work fine. Only S3 file access uses libcrypto (i.e., only _libexec/htslib/hfiles3.so is linked against libcrypto, and it is dynamically loaded by htslib), so only attempts to access S3 files would fail.

So that's perhaps not so bad. However the real answer is to migrate to building against openssl 3 instead of / (temporarily) alongside openssl 1.1, which is indeed now something that is overdue (as conda-forge made this migration over a month ago).

The even better answer for htslib may be to split the plugin executables off into a separate htslib-plugins subpackage that the main htslib package would not depend on. This would mean that htslib/samtools/bcftools no longer depended on openssl at all, reducing their maintenance burden and immediately sidestepping all the historical openssl mismatch user issues. However it would also mean that people wanting to access files over HTTPS, S3, GCS, etc directly would need to be taught that they also need to install htslib-plugins — which may make this approach a non-starter.

daler commented 1 year ago

I came to this thread late. I have some questions.

I didn't recognize that openssl=3 was an issue but did know that htslib=1.17 needed updating. So I used the recent htslib pinning change in builds on the bulk branch. The result is that we now have htslib=1.17 but it is using openssl=1.1.

There were 159 total packages that had htslib-induced updates from bioconda-utils update-pinning. Some of those are not building on bulk and need to be fixed.

mamba repoquery whoneeds htslib=1.17 -c conda-forge -c bioconda says there are currently ~75 recently-built packages that depend on htslib=1.17 which was built with openssl=1.1.

Note that with these new builds, the original reason @lkeegan opened this issue for appears to be solved.

Question 1: Are these htslib=1.17+openssl=1.1 packages a problem?

Question 2: What is the best course of action moving forward?

I think it would be the following:

  1. Fix the remaining packages failing in bulk that were recently bumped due to depending on htslib. Then at least we will have a complete "set" of packages that build against openssl=1.1 + htslib=1.17.
  2. (finally) use the updated conda-forge-pinning. This will be a lot of rebuilt packages. But over the next week at least I should have bandwidth to babysit the bulk builds. @jdblischak @jmarshall it would be great to get help fixing broken packages during this time (and anyone else you can gather...)
  3. Develop a plan for more frequent pinning updates. This has been under discussion for a while; fundamentally it is a resources question (human resources and compute resources). No good answer here.

Question 3: How to select which conda-forge-pinning to use? Naively I would just grab the latest available. But I know @mbargull had concerns regarding pinning the cxx compiler, and I don't know how to figure out if jumping over 6 months of pinning updates is otherwise problematic.

johanneskoester commented 1 year ago

@daler:

Answer 1: no, those should be fine. The problem are old packages that miss a proper version constraint on openssl. Answer 2: I agree. In addition we need repodata-patching for all old packages which have an unconstrained htslib dependency. I can give that a try. Answer 3: I would also use the latest.

jmarshall commented 1 year ago

There are two categories of problem:

Hence:

Answer 1: The existence of htslib=1.17+openssl=1.1 packages is not a problem. But the non-existence of htslib=1.17+openssl=3 packages is a problem, has been a problem since late January, and will become a bigger problem as time goes on.

johanneskoester commented 1 year ago

@jmarshall I completely agree. To me, the broken envs that are generated now are the bigger problem, that's why I want to solve it first. But yes, afterwards, we should definitely try to get to the update.

johanneskoester commented 1 year ago

I hope to soon implement a better solution for such pinning updates that moves away from the bulk branch, and uses ordinary pull requests instead. This way, we can hopefully democratize the update process, such that it is not completely dependent on the core team.

johanneskoester commented 1 year ago

So, here is the PR which fixes all old unpinned packages: https://github.com/bioconda/bioconda-recipes/pull/39812

lkeegan commented 1 year ago

@johanneskoester many thanks! The htslib package in the original issue here didn't have openssl listed in its dependencies at all, so I made an attempt to add these missing openssl dependencies here: #39816

johanneskoester commented 1 year ago

Thanks, looks good to me, but I would like to wait for a review by @jmarshall.

jmarshall commented 1 year ago

It is now six weeks later, and there is still no bioconda build of htslib-1.17 against openssl-3.

This would be fixed by updating the conda-forge-pinning that I pointed out in February in https://github.com/bioconda/bioconda-recipes/issues/39356#issuecomment-1430662243.

Worse still, this outdated pinning is now causing conda create -ntmp samtools to chose an outdated version of samtools from six years ago, which is a source of user confusion and soon of support burden. See bioconda/bioconda-utils#871.

(The first report from a confused user is here.)

johanneskoester commented 1 year ago

@jmarshall I hear you. Not happy with the situation as well.

  1. The problem you report points out that there are still some missing openssl deps in old samtools packages (and in principle would not be fixed by an update of the openssl pinning to 3, at most it would just be hidden). I hope to have fixed this with the following PR: https://github.com/bioconda/bioconda-recipes/pull/40629
  2. We should probably indeed start the update of the pinnings now. I'll check with the other core members.
jmarshall commented 1 year ago

Re your (1), the problem I report (that conda create -ntmp samtools has reverted to choosing an ancient samtools instead of the latest) has nothing to do with OpenSSL dependencies.

johanneskoester commented 1 year ago

Mhm, actually I was expecting that it would change something. But you are right, it doesn't. This is what my conda wants to do

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  bzip2              conda-forge/linux-64::bzip2-1.0.8-h7f98852_4 
  c-ares             conda-forge/linux-64::c-ares-1.18.1-h7f98852_0 
  ca-certificates    conda-forge/linux-64::ca-certificates-2022.12.7-ha878542_0 
  curl               conda-forge/linux-64::curl-7.88.1-hdc1c0ab_1 
  keyutils           conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 
  krb5               conda-forge/linux-64::krb5-1.20.1-h81ceb04_0 
  libcurl            conda-forge/linux-64::libcurl-7.88.1-hdc1c0ab_1 
  libedit            conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2 
  libev              conda-forge/linux-64::libev-4.33-h516909a_1 
  libgcc-ng          conda-forge/linux-64::libgcc-ng-12.2.0-h65d4601_19 
  libgomp            conda-forge/linux-64::libgomp-12.2.0-h65d4601_19 
  libnghttp2         conda-forge/linux-64::libnghttp2-1.52.0-h61bc06f_0 
  libssh2            conda-forge/linux-64::libssh2-1.10.0-hf14f497_3 
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-12.2.0-h46fd767_19 
  libzlib            conda-forge/linux-64::libzlib-1.2.13-h166bdaf_4 
  ncurses            conda-forge/linux-64::ncurses-6.3-h27087fc_1 
  openssl            conda-forge/linux-64::openssl-3.1.0-hd590300_2 
  samtools           bioconda/linux-64::samtools-1.6-h3f2fef4_8 
  xz                 conda-forge/linux-64::xz-5.2.6-h166bdaf_0 
  zlib               conda-forge/linux-64::zlib-1.2.13-h166bdaf_4 
  zstd               conda-forge/linux-64::zstd-1.5.2-h3eb15da_6

Which is weird, because I added "openssl >=1.1.0,<=1.1.1" as a dependency to samtools-1.6-h3f2fef4_8.

One should note that mamba works fine here (even before). It did not optimize for latest openssl but instead chose a newer samtools. So, while I agree that updating the pinning is a good idea, what we really have here is a bug in conda I would say.

jmarshall commented 1 year ago

what we really have here is a bug in conda I would say

Yes, it is a limitation of the _currentrepodata.json optimisation. I described how users can avoid it (https://github.com/bioconda/bioconda-recipes/issues/24199#issuecomment-758549310) and analysed the issue and what bioconda can do to mitigate it (https://github.com/bioconda/bioconda-recipes/issues/24199#issuecomment-758553486) two years ago, but @bioconda/core appears to have never considered the suggestions or even read the analysis.

Option (1) has got harder since bioconda started pinning these packages — we can't just rebuild htslib, staden_io_lib et al, but need to bump pinning and deploy infrastructure first. Hence it may be time to consider doing (2).

In the meantime, please update the pinning as described in bioconda/bioconda-utils#871. Today would be a good time to do it.

johanneskoester commented 1 year ago

Ok, thanks for pointing me to your old comments. Indeed, the repodata_hack sounds like a good solution for now. Ideally, I would like to generate that automatically based on the packages in bioconda. Basically, one has to scan through all the latest packages in the Bioconda channel, check on which conda-forge packages they depend, and insert those as artificial dependencies into the repodata_hack package in conda-forge. That sounds doable.

johanneskoester commented 1 year ago

At the same time, updating the pinnings is important as well (also to not overwhelm conda-forge's current_repodata with old packages, rendering it less efficient).

johanneskoester commented 1 year ago

In order to start updating the pinnings, I have to coordinate with the other core team members though, because (at least at the moment) this requires using the bulk branch (which is currently not in sync with master and I don't want to overwrite @daler's work there. Once that is sorted out, we can start. In the future, I would like to try to go back to ordinary PRs for such updates, but that requires some infrastructure changes first.

jmarshall commented 1 year ago

Another alternative for fixing the urgent “conda install samtools chooses ancient samtools” problem is to improve the artificially narrow libdeflate=1.13 dependencies that these packages have: in fact any libdeflate >=1.13 will work fine. PR #40675 proposes doing this by patching repodata to rewrite libdeflate dependencies.

It would still be important to update the pinning and rebuild — to get builds against openssl 3, as the OP requested — but applying that repodata patching would make the pinning update slightly less urgent.

johanneskoester commented 1 year ago

Thanks a lot. Merged your PR. Let me say that I highly appreciate all your help here!

johanneskoester commented 1 year ago

I am now in the process of cleanup up the bulk branch, so that we can hopefully update the pinning next week.

johanneskoester commented 1 year ago

@jmarshall I've now started the pinning update.