anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
222 stars 88 forks source link

[ENH] boostrap re-locking #375

Open AlbertDeFusco opened 2 years ago

AlbertDeFusco commented 2 years ago

There are cases where a package "disappears" from a Conda repository causing a previously locked project to fail to install, usually with a message that a package could not be found.

This happens with conda-forge packages when a broken package needs to be replaced. In some cases the package and version are still available but the build string has changed causing the lock file to break.

It would be great if anaconda-project update could accept a flag to re-lock using package versions but allow the build string to float.

Here's an example

## anaconda-project.yml

name: libtiff

packages:
  - libtiff=4.3.0

channels:
  - conda-forge

And the lockfile contains libtiff=4.3.0=hf544144_0

## anaconda-project.yml

# This is an Anaconda project lock file.
# The lock file locks down exact versions of all your dependencies.
#
# In most cases, this file is automatically maintained by the `anaconda-project` command or GUI tools.
# It's best to keep this file in revision control (such as git or svn).
# The file is in YAML format, please see http://www.yaml.org/start.html for more.
#

#
# Set to false to ignore locked versions.
#
locking_enabled: true

#
# A key goes in here for each env spec.
#
env_specs:
  default:
    locked: true
    env_spec_hash: 80ac30d7d76c632911967c7eb7409ecfd80cbe79
    platforms:
    - linux-64
    - osx-64
    - win-64
    packages:
      linux-64:
      - _libgcc_mutex=0.1=conda_forge
      - _openmp_mutex=4.5=1_gnu
      - jbig=2.1=h7f98852_2003
      - jpeg=9e=h7f98852_0
      - lerc=3.0=h9c3ff4c_0
      - libdeflate=1.10=h7f98852_0
      - libgcc-ng=11.2.0=h1d223b6_12
      - libgomp=11.2.0=h1d223b6_12
      - libstdcxx-ng=11.2.0=he4da1e4_12
      - libtiff=4.3.0=hf544144_0
      - libwebp-base=1.2.2=h7f98852_1
      - libzlib=1.2.11=h36c2ea0_1013
      - lz4-c=1.9.3=h9c3ff4c_1
      - xz=5.2.5=h516909a_1
      - zstd=1.5.2=ha95c52a_0
      osx-64:
      - jbig=2.1=h0d85af4_2003
      - jpeg=9e=h0d85af4_0
      - lerc=3.0=he49afe7_0
      - libcxx=12.0.1=habf9029_1
      - libdeflate=1.10=h0d85af4_0
      - libtiff=4.3.0=h17f2ce3_3
      - libwebp-base=1.2.2=h0d85af4_1
      - libzlib=1.2.11=h9173be1_1013
      - lz4-c=1.9.3=he49afe7_1
      - xz=5.2.5=haf1e3a3_1
      - zstd=1.5.2=h582d3a0_0
      win-64:
      - jbig=2.1=h8d14728_2003
      - jpeg=9e=h8ffe710_0
      - lerc=3.0=h0e60522_0
      - libdeflate=1.10=h8ffe710_0
      - libtiff=4.3.0=hc4061b1_3
      - libzlib=1.2.11=h8ffe710_1013
      - lz4-c=1.9.3=h8ffe710_1
      - ucrt=10.0.20348.0=h57928b3_0
      - vc=14.2=hb210afc_6
      - vs2015_runtime=14.29.30037=h902a5da_6
      - xz=5.2.5=h62dcd97_1
      - zstd=1.5.2=h6255e5f_0

Preparing the environment causes the following error

PackagesNotFoundError: The following packages are not available from current channels:

  - libtiff==4.3.0=h1167814_0

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

This build has been declared broken on conda-forge, but osx-64/libtiff-4.3.0-h17f2ce3_3.tar.bz2 is available, with the same version but a different build string.

jbednar commented 2 years ago

Thanks, Albert. This is definitely a problem for my team, and one that causes us many headaches, because we treat locking as a deeply sacred process. Once locked, we really want to be able recreate precisely what we locked to, warts and all. E.g. if this was a scientific project that we archived, we want to reproduce the previous results, even if they were buggy! Or if it's a server deployment that got re-deployed without changes, we want precisely the same running process as before, again even if there were issues with it before. Or if it's a regularly scheduled command, we want to run today just as it did yesterday, whether a package is considered "broken" or not; any other behavior is immediately a breakage, for something I didn't consider broken before! The locked environment is the only one we've actually tested and evaluated, and even if some updates and bugfixes later became available, we explicitly don't want them when building a locked environment. If we did, we would be making a fresh lock, not trying to build the existing one.

That perspective has some implications:

  1. If build strings are allowed to float, then we should be careful only to update those that have to float. E.g. if a half dozen projects have newer builds available but only one of those was broken, only that one should float, not the others. If the locked package is truly unavailable, then we have to make do, but that only applies to a package truly unavailable, not one with a later build.

  2. In the indicated case, I wouldn't even want to get the fresh build of libtiff, I would want the one on the broken or removed channel. Some maintainer's opinion that libtiff is broken is not of relevance to my locked environment; I'm the one who has built this environment, I'm the one who tested it, and if I am happy with it, why should it change in any way depending on the decision of some separate conda-forge or upstream maintainer? Those people should be free to make any decision they want about what's best for new solves, including labeling the package in any way, and it shouldn't affect my existing locked solve, the same as it doesn't affect anyone's already installed environment.

  3. The current experience for the user is actually much worse than having to decide between letting the build string float or enabling a broken or removed channel. All the user sees is either that this project they downloaded or revisited no longer builds, or that some automated build in a CI process has failed. They don't necessarily know whether they did something wrong, whether the original project was broken, whether the internet went down, or any number of other possible explanations. Thus begins either a very bad day of head-scratching or clueless stumbling around wondering why things are not working. In no case I'm aware of would I wish to experience that pain and frustration rather than simply to build the locked envt using the original package that got moved to broken. At a minimum, if building the locked environment is going to fail, it should do so gracefully, explaining precisely what package is missing, where it went, and how to relax the locking to build it. That's not at all what happens now.

  4. The only case I can think of where I wouldn't want the original locked build to succeed is if pypi had been hacked and complete malware had been snuck into a package. But in that case the package presumably won't simply be marked broken, it will be moved offline altogether? Anyway, even if this is a possibility, it's a very rare one; many of our "locked" environments have broken over the years, none for that reason.

  5. Here we're only discussing packages moved or hidden by the repository maintainers, but locked packages can also fail to be accessible for other reasons, such as accessing via a different conda repository in dev and prod environments, or when security advisories cause a package to be yanked from a local mirror. The user experience is similar, in that things break for reasons that require a full-blown investigation, while the fact of things being moved or deleted should somehow be recorded and used to generate useful error messages. These other cases are less clear how to handle, but progress at 1-4 will probably help them too.

  6. It's reasonable to point out that if the person archiving this environment used a Docker image or conda-pack to capture not just the named packages but the actual contents of the packages, then these issues won't arise. That's certainly true, and I do recommend that people do that for better reproducibility. But both a Docker image and conda-pack are many orders of magnitude larger and more complex than storing a lock file, which means that they are not always feasible and even when feasible are not version-controllable. Two lock files are easily diffable to see what might have changed, while two container images or conda-pack archives are not, and so lockfiles have important affordances that other approaches do not.

For these reasons, I strongly argue that conda and anaconda-project should do their best to find the indicated locked package if it is available in any way, and should automatically install that if so rather than failing or requiring an explicit user interaction. If people disagree and consider the current behavior of failing to be appropriate, then at least let's get more guidance in there for users, but I hope that's not the best we can do.

jlstevens commented 2 years ago

One observation I want to make is that what really matters for reproducibility is not where the package is to be found (e.g. defaults, conda-forge, conda-forge/label/broken) but what the exact contents of the package are. Conceptually, what you want to match against is the full set of content hashes (e.g. SHAs) in the lock: then it wouldn't matter that a package is moved to conda-forge/label/broken as its contents (as reflected by the content hash) remains unchanged.

I'm not proposing that we actually do it this way, and I am just describing the ideal behavior: a spec like libtiff==4.3.0=h1167814_0 is an attempt to capture an exact package but this fails to 1) capture the exact contents of this package 2) fails when this package 'moves' to a different label.