bioconda / bioconda-recipes

Conda recipes for the bioconda channel.
https://bioconda.github.io
MIT License
1.64k stars 3.26k forks source link

Newest version of package not installed unless specified #24199

Open mikecormier opened 4 years ago

mikecormier commented 4 years ago

When installing a package, like ggd, an older version of the package is installed unless the newest version is provided. Example conda install -c bioconda ggd will install version 0.1.2, which is a few versions away from the latest version. This problem has been seen with multiple packages from bioconda.

Additionally, when a package is updated or removed other packages are downgraded to an older version even though the package being updated or removed has no dependencies on the other packages. Again, an example is when a package, say vep is updated or removed it downgrades ggd to version 0.1.2 even though vep does not depend on ggd or any of the dependencies ggd has.

This seems to be a similar problem with the version priority when installing a specific package like ggd. How do we set version priority for a package so it doesn't revert back to an older version unless specified to do so?

dpellow commented 4 years ago

@mikecormier - have you found a solution to this problem?

jmarshall commented 4 years ago

This is a significant general problem impacting Conda's usefulness, also reported as conda/conda#9905 and having received some discussion there.

There are many other similar reports here, e.g., #24621, #24320, #24264, #24182, #24033, and #22824.

jmarshall commented 3 years ago

@bgruening @dpryan79 @bioconda/core: This bug is causing confusion for users every day — most recently as reported in samtools/htslib#1192 — and has been ongoing for months. Whether this is properly a conda problem (cf conda/conda#9905) or something that could in principle be fixed with updates to the bioconda repo metadata files, please pin this issue and add an explanation of the workarounds (check the version you're offered and explicitly specify the latest version if necessary; mamba apparently) and ideally an overview of the roadmap towards fixing this (if there is one).

dpryan79 commented 3 years ago

@jmarshall As you're aware, we have no control over this issue. I'm happy to pin this, but be aware that it will not help since regular users almost never read such things. As you indicated, the only solutions are (1) to specify the exact version you want or (2) to use mamba, which lacks this annoying quirk.

jmarshall commented 3 years ago

Conda create/install starts by downloading a trimmed-down _currentrepodata.json metadata file from each channel, and by default only downloads the full repodata.json files if it needs to. This issue occurs when the newest package version cannot be satisfied via _currentrepodata.json but an older version can, so it uses that instead of retrying with the full repodata.json. Hence it can be worked around in any of the following ways:

  1. Ask for the particular newest version (to prevent the older version from succeeding): conda create/install package==version

  2. Skip the _currentrepodata.json attempt entirely: CONDA_REPODATA_FNS=repodata.json conda create/install package

  3. Use mamba instead of conda

jmarshall commented 3 years ago

But we're here to fix this problem, not just provide awkward workarounds that users would need to use :smile:

As described in https://github.com/conda/conda/issues/9905#issuecomment-622281137 and its followups, the problem here is that these bioconda recipes depend on superseded conda-forge package versions that are no longer in conda-forge's _currentrepodata.json cache. When there are older versions of these bioconda recipes with reduced requirements (or otherwise) hence do not depend on such packages omitted from conda-forge's _currentrepodata.json, those older versions are selected for installation (leading to user disappointment) rather than installation falling back to retrieving conda-forge's complete repodata.json and selecting the desired up-to-date bioconda packages and their now-visible conda-forge dependencies.

For concreteness, consider htslib and libdeflate. The htslib-1.11 package was first built on 2020-09-23 against the then-current libdeflate-1.6. At that time, libdeflate-1.6 was the latest version and was in conda-forge's _currentrepodata.json, and conda create -n tmp htslib delivered htslib-1.11 as expected. However on 2020-11-12 conda-forge released libdeflate-1.7 and libdeflate-1.6 fell out of their _currentrepodata.json. From then onwards, conda create -n tmp htslib has installed an older htslib that happens to depend on an older libdeflate from the bioconda channel.

(If the dependant package was also in conda-forge, then the construction of _currentrepodata.json would ensure that the particular packages it requires were also listed in _currentrepodata.json. But this does not automatically work across separate channels.)

In lieu of improving the _currentrepodata.json cache mechanism, bioconda can avoid this problem by ensuring (via manually-enforced policy) that its current builds of these packages depend only on conda-forge packages that actually are listed in conda-forge's _currentrepodata.json. There are several possible ways to ensure this (again, for concreteness consider htslib and libdeflate):

  1. Whenever conda-forge release a new version of libdeflate, rebuild the current htslib against that version. This is what PR #26085 does, and may or may not be the right thing to do depending on bioconda's pinning policy for libdeflate.

    There are currently 16 bioconda packages requiring libdeflate and I think the current latest builds of all of them depend on libdeflate >=1.6,<1.7.0a0. In particular, wiggletools 1.2.7 and 1.2.8 have been built in the last week and depend on libdeflate 1.6 despite 1.7 being the current conda-forge version at the time they were built. Conversely PR #26085 appears to have happily built against 1.7. So I'm not sure why this PR seems to have got libdeflate-1.7 successfully, and perhaps the final build would get 1.6 similarly to the wiggletools ones… [Edited to add: Wiggletools build-depends on htslib, hence while building it the pinned htslib build is pulled in which at present pulls in libdeflate-1.6. Hence why building wiggletools gets 1.6 but building htslib itself gets the current conda-forge libdeflate, 1.7.]

  2. Ask conda-forge to add a package whose purpose is to list package versions that bioconda wants to have available in conda-forge's _currentrepodata.json, e.g. libdeflate-1.6. This would be similar to the existing _current_repodata_hack which (I assume) is not intended to be installed by anyone but merely exists to ensure other packages are listed in _currentrepodata.json. A draft of such a package is at jmarshall/staged-recipes/…/_current_repodata_bioconda_hack.

  3. [2023 addition] Now that repodata patching is reasonably practical, it provides another solution to this problem. As long as new libdeflate releases remain compatible, existing packages can be patched to allow the use of newer libdeflates. See https://github.com/bioconda/bioconda-recipes/issues/24199#issuecomment-1527422484 below for details.

As noted in https://github.com/bioconda/bioconda-recipes/issues/17212#issuecomment-755285842, which of these two approaches is the appropriate one comes down to @bioconda/core's policy around libdeflate pinning (and in general pinning of other conda-forge dependencies). Is libdeflate currently pinned to 1.6? When would that be updated to 1.7?

[Edited to add: Libdeflate was pinned in the distant past, but this was removed in bioconda/bioconda-utils#610. That PR and commit don't explain why the pinning was removed — but I suspect it's because, of 16 bioconda packages that list libdeflate as a dependency, only a couple (htslib, staden_io_lib, possibly one or two others) actually ought to depend directly on libdeflate. So the answer is libdeflate is not pinned; and it's not pinned because it really doesn't need to be.]

I have a slight preference for (2) as it is more flexible and less timing-critical at the moments when packages such as libdeflate are updated. What are @bioconda/core's thoughts?

jkbonfield commented 3 years ago

Maybe I'm missing something obvious, but looking at the Makefile it seems libdeflate 1.6 and 1.7 both have a library .so version of 0. Hence the package ABI hasn't changed and there should be no reason at all for a package to claim it depends precisely on 1.6. That's just a recipe for disaster.

Can conda not pin on library so versions instead of package release numbers? If not, then maybe just start with the assumption that so doesn't change and pin if, and only if, a ABI breaking change is discovered later on.

jmarshall commented 3 years ago

@jkbonfield: Yes, conda is way too conservative here, but not having soversion-based dependency tracking infrastructure is a separate problem. (Adding such infrastructure would greatly reduce the times during which this issue appears, but it's a lot of work and a major change that would require major buy-in from the conda mothership — while this proposal is a simple policy that bioconda could implement today.)

jmarshall commented 3 years ago

For the htslib/libdeflate case: it turns out that libdeflate is not pinned at all in bioconda. Hence the correct approach to fix it (at least, until libdeflate-1.8 comes out) really is simply to bump htslib so that there exists an htslib package built against conda-forge's current libdeflate-1.7. PR #26237 does that and has been merged, so conda create -n myenv htslib/samtools/bcftools now all do the right thing using just _currentrepodata.json. :tada:

(It remains to work through the dozen other overlinked bioconda packages that spuriously depend on both htslib and libdeflate. By bumping them to remove their explicit libdeflate requirement — in reality, they only depend on it via htslib — they will no longer be stuck on libdeflate-1.6.)

When in the future conda-forge releases libdeflate-1.8 and libdeflate-1.7 falls out of their _currentrepodata.json, conda create -n myenv htslib etc will start choosing old versions again, until we bump htslib correspondingly. There are bioconda people involved in maintaining libdeflate-feedstock so this shouldn't be a huge imposition, but nonetheless there will always be a period of failure between libdeflate updates and htslib package bumps — unless we take steps to ensure the older libdeflate remains available in the union of channels' _currentrepodata.json files, e.g. by either of

  1. Realising that bioconda may have shot itself in the foot by migrating libdeflate :smile: and re-adding an equivalent libdeflate-1.7 (currently) package to the bioconda channel. I'm sure having packages available in multiple channels is not ideal, but does it cause problems?

  2. [Same as (2) above] Ask conda-forge to add a package whose purpose is to list package versions that bioconda wants to have available in conda-forge's _currentrepodata.json, e.g. libdeflate-1.6 and/or libdeflate-1.7. This would be similar to the existing _current_repodata_hack which (I assume) is not intended to be installed by anyone but merely exists to ensure other packages are listed in _currentrepodata.json. A draft of such a package is at jmarshall/staged-recipes/…/_current_repodata_bioconda_hack.


For the other packages for which older versions are still being installed, it remains to identify the particular critical conda-forge package that is causing the problem and bump the affected package (if there are no pinning considerations) and/or resolve the problem appropriately. I might investigate for blast.

jmarshall commented 1 year ago

These days repodata patching provides another alternate for dealing with this problem, in cases like libdeflate where there is in fact forward compatibility and the library's soversion has not changed.

PR #40675 applies this to htslib, staden_io_lib, and fastp, which are the main bioconda packages directly using libdeflate.