Open AlbertDeFusco opened 2 years ago
Thanks, Albert. This is definitely a problem for my team, and one that causes us many headaches, because we treat locking as a deeply sacred process. Once locked, we really want to be able recreate precisely what we locked to, warts and all. E.g. if this was a scientific project that we archived, we want to reproduce the previous results, even if they were buggy! Or if it's a server deployment that got re-deployed without changes, we want precisely the same running process as before, again even if there were issues with it before. Or if it's a regularly scheduled command, we want to run today just as it did yesterday, whether a package is considered "broken" or not; any other behavior is immediately a breakage, for something I didn't consider broken before! The locked environment is the only one we've actually tested and evaluated, and even if some updates and bugfixes later became available, we explicitly don't want them when building a locked environment. If we did, we would be making a fresh lock, not trying to build the existing one.
That perspective has some implications:
If build strings are allowed to float, then we should be careful only to update those that have to float. E.g. if a half dozen projects have newer builds available but only one of those was broken, only that one should float, not the others. If the locked package is truly unavailable, then we have to make do, but that only applies to a package truly unavailable, not one with a later build.
In the indicated case, I wouldn't even want to get the fresh build of libtiff, I would want the one on the broken
or removed
channel. Some maintainer's opinion that libtiff is broken is not of relevance to my locked environment; I'm the one who has built this environment, I'm the one who tested it, and if I am happy with it, why should it change in any way depending on the decision of some separate conda-forge or upstream maintainer? Those people should be free to make any decision they want about what's best for new solves, including labeling the package in any way, and it shouldn't affect my existing locked solve, the same as it doesn't affect anyone's already installed environment.
The current experience for the user is actually much worse than having to decide between letting the build string float or enabling a broken
or removed
channel. All the user sees is either that this project they downloaded or revisited no longer builds, or that some automated build in a CI process has failed. They don't necessarily know whether they did something wrong, whether the original project was broken, whether the internet went down, or any number of other possible explanations. Thus begins either a very bad day of head-scratching or clueless stumbling around wondering why things are not working. In no case I'm aware of would I wish to experience that pain and frustration rather than simply to build the locked envt using the original package that got moved to broken
. At a minimum, if building the locked environment is going to fail, it should do so gracefully, explaining precisely what package is missing, where it went, and how to relax the locking to build it. That's not at all what happens now.
The only case I can think of where I wouldn't want the original locked build to succeed is if pypi had been hacked and complete malware had been snuck into a package. But in that case the package presumably won't simply be marked broken
, it will be moved offline altogether? Anyway, even if this is a possibility, it's a very rare one; many of our "locked" environments have broken over the years, none for that reason.
Here we're only discussing packages moved or hidden by the repository maintainers, but locked packages can also fail to be accessible for other reasons, such as accessing via a different conda repository in dev and prod environments, or when security advisories cause a package to be yanked from a local mirror. The user experience is similar, in that things break for reasons that require a full-blown investigation, while the fact of things being moved or deleted should somehow be recorded and used to generate useful error messages. These other cases are less clear how to handle, but progress at 1-4 will probably help them too.
It's reasonable to point out that if the person archiving this environment used a Docker image or conda-pack to capture not just the named packages but the actual contents of the packages, then these issues won't arise. That's certainly true, and I do recommend that people do that for better reproducibility. But both a Docker image and conda-pack are many orders of magnitude larger and more complex than storing a lock file, which means that they are not always feasible and even when feasible are not version-controllable. Two lock files are easily diffable to see what might have changed, while two container images or conda-pack archives are not, and so lockfiles have important affordances that other approaches do not.
For these reasons, I strongly argue that conda and anaconda-project should do their best to find the indicated locked package if it is available in any way, and should automatically install that if so rather than failing or requiring an explicit user interaction. If people disagree and consider the current behavior of failing to be appropriate, then at least let's get more guidance in there for users, but I hope that's not the best we can do.
One observation I want to make is that what really matters for reproducibility is not where the package is to be found (e.g. defaults, conda-forge, conda-forge/label/broken
) but what the exact contents of the package are. Conceptually, what you want to match against is the full set of content hashes (e.g. SHAs) in the lock: then it wouldn't matter that a package is moved to conda-forge/label/broken
as its contents (as reflected by the content hash) remains unchanged.
I'm not proposing that we actually do it this way, and I am just describing the ideal behavior: a spec like libtiff==4.3.0=h1167814_0
is an attempt to capture an exact package but this fails to 1) capture the exact contents of this package 2) fails when this package 'moves' to a different label.
There are cases where a package "disappears" from a Conda repository causing a previously locked project to fail to install, usually with a message that a package could not be found.
This happens with conda-forge packages when a broken package needs to be replaced. In some cases the package and version are still available but the build string has changed causing the lock file to break.
It would be great if
anaconda-project update
could accept a flag to re-lock using package versions but allow the build string to float.Here's an example
And the lockfile contains
libtiff=4.3.0=hf544144_0
Preparing the environment causes the following error
This build has been declared broken on conda-forge, but
osx-64/libtiff-4.3.0-h17f2ce3_3.tar.bz2
is available, with the same version but a different build string.