Open lkeegan opened 1 year ago
At present, bioconda_utils pins conda-forge-pinning=2022.08.25.15.20.42
.
Conda-forge moved to openssl 3 in conda-forge/conda-forge-pinning-feedstock#3892, merged on January 25th. So to move to rebuilding OpenSSL-using bioconda packages similarly, we will need to update bioconda_utils to pin a version of conda-forge-pinning past 2023.01.25.21.47.14 or so, and probably build updates on the bulk branch.
Thanks for the explanation, that makes sense.
In the meantime I guess the simplest fix for the broken installation issue would just be to require pysam >0.9.1 in pbiotools to avoid this particular combination (with this additional requirement I tried a bunch of installs with openssl3 and didn't find any more similar issues)
Is there anything non-core maintainers can assist with migrating to openssl 3? This openssl discrepancy is also causing headaches for recipes I co-maintain that depend on htslib
I think the main source of the problems are old packages that have incomplete pinnings of their dependencies (in particular openssl dependencies). The proper way to fix them is to patch their metadata. This works via so-called repodata patching. The good thing: you can help with that.
Since our internal hackathon this tuesday, the process is now nicely described here: http://bioconda.github.io/developer/repodata_patching.html Hence, you just need to follow that guideline and introduce a small piece of fixing code for each old package that contains an unconstrained openssl dependency.
Since our internal hackathon this tuesday, the process is now nicely described here: http://bioconda.github.io/developer/repodata_patching.html
This is amazing documentation! I understand the repodata patching process much better now
Hence, you just need to follow that guideline and introduce a small piece of fixing code for each old package that contains an unconstrained openssl dependency.
However, I must admit that I am having trouble seeing how this will fix the openssl problem. bioconda recipes that depend on openssl have only been built against openssl 1. conda-forge has migrated to only building for openssl 3. Thus newer builds of conda-forge binaries are built against openssl 3 (eg Python 3.10.9 from https://github.com/samtools/htslib/issues/1578). Thus if you try to install Python 3.10.9 from conda-forge and htslib from bioconda on a linux machine, it will install a super old htslib that didn't yet depend on openssl. The workaround is to install the most recent Python available from conda-forge that is still built against openssl 1. Here's a demo:
# installs super-old htslib that doesn't depend on openssl
mamba create --dry-run -n test-openssl \
--override-channels -c conda-forge -c bioconda \
htslib python=3.10.9
## + htslib 1.10 h244ad75_0 bioconda/linux-64 2MB
## + openssl 3.0.8 h0b41bf4_0 conda-forge/linux-64 Cached
## + python 3.10.9 he550d4f_0_cpython conda-forge/linux-64 Cached
# workaround: install slightly older Python built against openssl 1
mamba create --dry-run -n test-openssl \
--override-channels -c conda-forge -c bioconda \
htslib python=3.10.8
## + htslib 1.16 h6bc39ce_0 bioconda/linux-64 2MB
## + openssl 1.1.1t h0b41bf4_0 conda-forge/linux-64 2MB
## + python 3.10.8 h257c98d_0_cpython conda-forge/linux-64 23MB
The repodata for the most recent version of htslib for linux-64 is below:
"htslib-1.16-h6bc39ce_0.tar.bz2": {
"build": "h6bc39ce_0",
"build_number": 0,
"depends": [
"bzip2 >=1.0.8,<2.0a0",
"libcurl >=7.83.1,<8.0a0",
"libdeflate >=1.13,<1.14.0a0",
"libgcc-ng >=12",
"libzlib >=1.2.12,<1.3.0a0",
"openssl >=1.1.1q,<1.1.2a",
"xz >=5.2.6,<5.3.0a0",
"zlib >=1.2.12,<1.3.0a0"
],
"license": "MIT",
"md5": "503d5675802731748f348164d978c6de",
"name": "htslib",
"sha256": "a2ac96fa30ff9153eb42deb46d10c32022d314eef94a5f64e65ba9928708f53a",
"size": 2411073,
"subdir": "linux-64",
"timestamp": 1663255789705,
"version": "1.16"
},
The openssl depedency is constrained to "openssl >=1.1.1q,<1.1.2a",
because it was built against openssl 1.1.1. We could patch this dependency to the more relaxed "openssl >=1.1.1q,<4.0a0",
. This would enable the conda solver to install recent versions of htslib in the same conda env as openssl 3 and the most recently updated conda-forge binaries. But this seems dangerous. I don't know the details of the differences between openssl 1 and 3, and it might be that htslib only access symbols that remained constant between openssl 1 and 3, but at minimum this should be tested first.
A related problem is that conda is set by default to aggressively update openssl for security reasons. Thus as the original post in this thread demonstrates, conda can find a better solution by being explicitly told to pin openssl to 1.1.1.
$ conda config --show aggressive_update_packages
aggressive_update_packages:
- ca-certificates
- certifi
- openssl
For long-term compatibility between conda-forge and bioconda, I don't see any solution other than to start building bioconda recipes against a more recent version of conda-forge-pinning. In the short term, individual conda-forge recipes can be built for both openssl 1 and 3 by manually specifying this in recipe/conda_build_config.yaml
. That way they can still be co-installed with bioconda binaries built against openssl 1.1.1.
By the way, for anyone currently struggling to install htslib, we have built htslib 1.16 against both openssl 1 and 3 and multiple versions of libdeflate. You can install it from the tiledb channel with mamba install -c tiledb htslib
https://anaconda.org/tiledb/htslib/files https://github.com/TileDB-Inc/htslib-feedstock https://github.com/TileDB-Inc/htslib-feedstock/blob/main/recipe/conda_build_config.yaml
Correct, the repodata patching will only prevent old packages from incorrectly being used in the environment solution, there's nothing that can make existing packages compatible aside from recompiling them. We'll need a migration for that, which I guess we need to do sooner rather than later.
The openssl depedency is constrained to "openssl >=1.1.1q,<1.1.2a", because it was built against openssl 1.1.1. We could patch this dependency to the more relaxed "openssl >=1.1.1q,<4.0a0",. This would enable the conda solver to install recent versions of htslib in the same conda env as openssl 3 and the most recently updated conda-forge binaries. But this seems dangerous.
This would not work at all, because the soversions are different: openssl 1.1.x provides lib/libcrypto.so.1.1 while 3.0.x provides lib/libcrypto.so.3. So the parts of the htslib package that are linked against libcrypto will fail to execute at all, because the shared library they are looking for is not present.
(This category of problems occurs largely because conda does not track soversion dependencies.)
However because bioconda's htslib package has been built with plugins enabled, if you were to lie about the constraints as suggested then htslib and programs that use it would in fact likely work fine. Only S3 file access uses libcrypto (i.e., only _libexec/htslib/hfiles3.so is linked against libcrypto, and it is dynamically loaded by htslib), so only attempts to access S3 files would fail.
So that's perhaps not so bad. However the real answer is to migrate to building against openssl 3 instead of / (temporarily) alongside openssl 1.1, which is indeed now something that is overdue (as conda-forge made this migration over a month ago).
The even better answer for htslib may be to split the plugin executables off into a separate htslib-plugins
subpackage that the main htslib
package would not depend on. This would mean that htslib/samtools/bcftools no longer depended on openssl at all, reducing their maintenance burden and immediately sidestepping all the historical openssl mismatch user issues. However it would also mean that people wanting to access files over HTTPS, S3, GCS, etc directly would need to be taught that they also need to install htslib-plugins
— which may make this approach a non-starter.
I came to this thread late. I have some questions.
I didn't recognize that openssl=3 was an issue but did know that htslib=1.17 needed updating. So I used the recent htslib pinning change in builds on the bulk
branch. The result is that we now have htslib=1.17 but it is using openssl=1.1.
There were 159 total packages that had htslib-induced updates from bioconda-utils update-pinning
. Some of those are not building on bulk
and need to be fixed.
mamba repoquery whoneeds htslib=1.17 -c conda-forge -c bioconda
says there are currently ~75 recently-built packages that depend on htslib=1.17 which was built with openssl=1.1
.
Note that with these new builds, the original reason @lkeegan opened this issue for appears to be solved.
Question 1: Are these htslib=1.17+openssl=1.1 packages a problem?
Question 2: What is the best course of action moving forward?
I think it would be the following:
bulk
that were recently bumped due to depending on htslib.
Then at least we will have a complete "set" of packages that build against openssl=1.1 + htslib=1.17. Question 3: How to select which conda-forge-pinning
to use? Naively I would just grab the latest available. But I know @mbargull had concerns regarding pinning the cxx compiler, and I don't know how to figure out if jumping over 6 months of pinning updates is otherwise problematic.
@daler:
Answer 1: no, those should be fine. The problem are old packages that miss a proper version constraint on openssl. Answer 2: I agree. In addition we need repodata-patching for all old packages which have an unconstrained htslib dependency. I can give that a try. Answer 3: I would also use the latest.
There are two categories of problem:
Hence:
Answer 1: The existence of htslib=1.17+openssl=1.1 packages is not a problem. But the non-existence of htslib=1.17+openssl=3 packages is a problem, has been a problem since late January, and will become a bigger problem as time goes on.
@jmarshall I completely agree. To me, the broken envs that are generated now are the bigger problem, that's why I want to solve it first. But yes, afterwards, we should definitely try to get to the update.
I hope to soon implement a better solution for such pinning updates that moves away from the bulk branch, and uses ordinary pull requests instead. This way, we can hopefully democratize the update process, such that it is not completely dependent on the core team.
So, here is the PR which fixes all old unpinned packages: https://github.com/bioconda/bioconda-recipes/pull/39812
@johanneskoester many thanks! The htslib package in the original issue here didn't have openssl listed in its dependencies at all, so I made an attempt to add these missing openssl dependencies here: #39816
Thanks, looks good to me, but I would like to wait for a review by @jmarshall.
It is now six weeks later, and there is still no bioconda build of htslib-1.17 against openssl-3.
This would be fixed by updating the conda-forge-pinning that I pointed out in February in https://github.com/bioconda/bioconda-recipes/issues/39356#issuecomment-1430662243.
Worse still, this outdated pinning is now causing conda create -ntmp samtools
to chose an outdated version of samtools from six years ago, which is a source of user confusion and soon of support burden. See bioconda/bioconda-utils#871.
(The first report from a confused user is here.)
@jmarshall I hear you. Not happy with the situation as well.
Re your (1), the problem I report (that conda create -ntmp samtools
has reverted to choosing an ancient samtools instead of the latest) has nothing to do with OpenSSL dependencies.
Mhm, actually I was expecting that it would change something. But you are right, it doesn't. This is what my conda wants to do
The following NEW packages will be INSTALLED:
_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4
c-ares conda-forge/linux-64::c-ares-1.18.1-h7f98852_0
ca-certificates conda-forge/linux-64::ca-certificates-2022.12.7-ha878542_0
curl conda-forge/linux-64::curl-7.88.1-hdc1c0ab_1
keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0
krb5 conda-forge/linux-64::krb5-1.20.1-h81ceb04_0
libcurl conda-forge/linux-64::libcurl-7.88.1-hdc1c0ab_1
libedit conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2
libev conda-forge/linux-64::libev-4.33-h516909a_1
libgcc-ng conda-forge/linux-64::libgcc-ng-12.2.0-h65d4601_19
libgomp conda-forge/linux-64::libgomp-12.2.0-h65d4601_19
libnghttp2 conda-forge/linux-64::libnghttp2-1.52.0-h61bc06f_0
libssh2 conda-forge/linux-64::libssh2-1.10.0-hf14f497_3
libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-12.2.0-h46fd767_19
libzlib conda-forge/linux-64::libzlib-1.2.13-h166bdaf_4
ncurses conda-forge/linux-64::ncurses-6.3-h27087fc_1
openssl conda-forge/linux-64::openssl-3.1.0-hd590300_2
samtools bioconda/linux-64::samtools-1.6-h3f2fef4_8
xz conda-forge/linux-64::xz-5.2.6-h166bdaf_0
zlib conda-forge/linux-64::zlib-1.2.13-h166bdaf_4
zstd conda-forge/linux-64::zstd-1.5.2-h3eb15da_6
Which is weird, because I added "openssl >=1.1.0,<=1.1.1"
as a dependency to samtools-1.6-h3f2fef4_8
.
One should note that mamba works fine here (even before). It did not optimize for latest openssl but instead chose a newer samtools. So, while I agree that updating the pinning is a good idea, what we really have here is a bug in conda I would say.
what we really have here is a bug in conda I would say
Yes, it is a limitation of the _currentrepodata.json optimisation. I described how users can avoid it (https://github.com/bioconda/bioconda-recipes/issues/24199#issuecomment-758549310) and analysed the issue and what bioconda can do to mitigate it (https://github.com/bioconda/bioconda-recipes/issues/24199#issuecomment-758553486) two years ago, but @bioconda/core appears to have never considered the suggestions or even read the analysis.
Option (1) has got harder since bioconda started pinning these packages — we can't just rebuild htslib, staden_io_lib et al, but need to bump pinning and deploy infrastructure first. Hence it may be time to consider doing (2).
In the meantime, please update the pinning as described in bioconda/bioconda-utils#871. Today would be a good time to do it.
Ok, thanks for pointing me to your old comments. Indeed, the repodata_hack sounds like a good solution for now. Ideally, I would like to generate that automatically based on the packages in bioconda. Basically, one has to scan through all the latest packages in the Bioconda channel, check on which conda-forge packages they depend, and insert those as artificial dependencies into the repodata_hack package in conda-forge. That sounds doable.
At the same time, updating the pinnings is important as well (also to not overwhelm conda-forge's current_repodata with old packages, rendering it less efficient).
In order to start updating the pinnings, I have to coordinate with the other core team members though, because (at least at the moment) this requires using the bulk branch (which is currently not in sync with master and I don't want to overwrite @daler's work there. Once that is sorted out, we can start. In the future, I would like to try to go back to ordinary PRs for such updates, but that requires some infrastructure changes first.
Another alternative for fixing the urgent “conda install samtools
chooses ancient samtools” problem is to improve the artificially narrow libdeflate=1.13 dependencies that these packages have: in fact any libdeflate >=1.13 will work fine. PR #40675 proposes doing this by patching repodata to rewrite libdeflate dependencies.
It would still be important to update the pinning and rebuild — to get builds against openssl 3, as the OP requested — but applying that repodata patching would make the pinning update slightly less urgent.
Thanks a lot. Merged your PR. Let me say that I highly appreciate all your help here!
I am now in the process of cleanup up the bulk branch, so that we can hopefully update the pinning next week.
@jmarshall I've now started the pinning update.
I think this is causing some strange issues for me with dependency resolution:
If I do
mamba install "python=3.9" pbiotools
, the packages to be installed includeApparently this old pysam package is considered compatible with openssl 3 when dependencies are resolved (I think because it depends on
htslib 1.3.1 1
which doesn't include openssl as a dependency), but this installation is broken:If I also pin openssl,
mamba install "python=3.9" pbiotools "openssl=1.1.1"
, then I get a working install with a much more recent version of pysam:Would it make sense to temporarily add
openssl ==1.1.1
as a dependency to the pbiotools package as a workaround for this issue?And are there any plans to follow conda-forge and migrate to using openssl 3 on bioconda?
Thanks!