Closed scottyhq closed 1 year ago
Not removing obsolete dependencies is indeed problematic and probably not a desirable behavior. Even an upgrade of an existing dependency might leave behind former transitive dependencies that might eventually become incompatible with the rest of the current (and updated) dependencies.
Just checking to see if this issue has been resolved in conda-lock>=2
? Or do we still need to do the rm conda-lock.yml && conda-lock lock --mamba -f environment.yml
workaround?
As far as I know this is unfortunately not yet resolved.
Today I stumbled on a nasty consequence of this behavior. In my case, I had a dependency in the pip
dependency subsection of my environment.yml
, and that dependency became recently available on conda-forge
. So naturally, I moved it out of the pip
section and into the main dependencies
list and ran conda-lock
.
The result was that the same dependency was listed twice in conda-lock.yml
, once for conda
and once for pip
. Of course, the pip
version overwrote the conda
version in the last stage of the install. This resulted in a broken environment with incompatible dependencies, even though our freshly-locked environment.yml
defined a compatible environment spec.
I created a repository that walks through a simplified version of what I experienced, without the dependency conflicts: https://github.com/mfisher87/sscce-conda-lock-pip-ghosts
The real-world scenario I encountered was that we upgraded an old environment.yml file containing sphinx
and autodoc-pydantic
dependencies, and moved autodoc-pydantic
out of the pip
section at the same time. autodoc-pydantic==1.5.1
, incompatible with recent versions of sphinx
, was silently overwriting the "correct" version even though our environment file that our lock file was based on only contained the spec autodoc-pydantic==1.9.0
. We fixed it by deleting conda-lock.yml
and re-locking, but up until figuring out what was going on I was very confused :laughing:
Given this behavior can lead to envs that don't reproduce as expected, could this issue be pinned to help confused folks like myself find it?
Thinking about digging in to this as an exercise this weekend. From an outside perspective, without having any design context, I'd expect conda-lock to replace the previous lockfile wholesale every time a solve is completed successfully. Are there reasons not to do that, or would there be major challenges in implementing such behavior?
Thanks all for being welcoming to questions! :heart:
We took a stab at this. I feel like multiple sources (which I don't fully understand the use case for, and haven't found docs for yet) may be a confounding factor for our solution. Our naive understanding is that we need to retain support for per-platform updates, so we can't simply blow away the config file and drop in a new one. Draft PR coming soon :)
Sorry for my lack of responsiveness this past weekend.
FWIW I don't understand or like the current behavior, and consequently I always delete the conda-lock.yml
before relocking, which I find awkward.
I'd like to see things done differently, and would be supportive of doing a major release to change this behavior.
@mariusvniekerk, are there any good reasons for keeping the current behavior which I should know about? Also @bollwyvl?
I hadn't noticed these issues, mostly as I don't use the .yml
format, and even then i still delete my .lock
(or, according to constructor
, now, .txt
) files before resolving. So I can't really comment on how the existing strategy works, but just some thoughts:
In some other lockfile-based systems i've used, once a named entry appears in the lock, it will never get updated unless something new comes along that can be met by the existing entry, and all unused entries will be removed at the end. If the .yml
format allows for partial locking, I could see how this could get very complicated. This is part of the reason I just use the @EXPLICIT
files, as when i reach for conda-lock
i almost always have multiple environments to loop/matrix over something anyway.
In conda(-forge)-land, generally when I relock, I want the freshest packages that I haven't pinned, and with mutable upstream state like repodata-patches
, keeping old, unexamined, transient entries might be actively harmful, if a key dependency has changed (see pydantic<2
, urllib<3
all over in the last few months).
Sorry for my lack of responsiveness this past weekend.
Never a problem in my mind! Hope you got away from work for a bit :)
FWIW I don't understand or like the current behavior, and consequently I always delete the conda-lock.yml before relocking, which I find awkward.
I'd like to see things done differently, and would be supportive of doing a major release to change this behavior.
Yes, once I discovered this behavior, I realized I should be doing the same... but I don't want to :laughing: Maybe a --merge/--no-merge
flag or something could be useful to ensure that absolutely none of the existing lock content is considered. And maybe it should default to --no-merge
if we're looking at a major release. But that may require some more refactoring to enable --update
behavior to continue to work as expected.
once a named entry appears in the lock, it will never get updated unless something new comes along that ~can~ can not be met by the existing entry
I'm not sure I understand this as written; should "can" actually be "can not"? I.e. once an entry, e.g. foo=1.2.3
shows up in the lock file, it will never get updated until another entry is added that is incompatible, e.g. bar=2.3.4
, which requires foo>1
?
I'll add a little of my experience and expectations here. First, I agree with the growing sentiment that extraneous packages could be removed from the lock and I've also gotten in the habit of rm conda-lock.yml && conda-lock lock
. To me this sounds much like conda env update --prune
, which was recently fixed for classic solver.
There is one wrinkle that might be worth considering that a user brought to my attention: relock an env spec with minimal amount of changes. I wonder if there might be a way to have both a minimal update along with pruning of orphaned and un-requested packages.
--prune
I've seen some lockfile tools that offer a [--strategy=latest]
argument... seems like having a countable number of these wrapping existing conda/mamba arguments could be very useful.
A super useful example, which may or may not be supported by the solvers, would be a --strategy=oldest
, for building oldest compatible package set meeting the given specs. This would allow for using a single (set of) environment.yml
file(s) to generate a (set of) lockfile(s) that accurately reflect both the latest and oldest, both of which took into account things like repodata-patches
. I recently had to do something similar to bisect an upstream for a repodata patch, but ended up falling back to the upstream package manager (pip
) for the nitty-gritty.
both a minimal update along with pruning of orphaned
As for combining multiple strategy solves of in a single .yml
: again, I have no skin in the game, but seems extra-super complicated.
@maresb can we mark this as resolved now that #485 is merged? Or should we wait for next release?
cc @weiji14 :wave: :)
If you could write a brief release note let's just do a release now.
Sure, will do my best! This is a tough one to boil down :)
Resolved an issue causing dependencies to persist in the lock file even if they were removed from the specification file (see #196).
Or a more low-level explanation, please cut up or edit however you like:
conda-lock no longer merges a pre-existing lockfile with freshly-generated locked dependencies. This was causing dependencies to persist in the lock file even if they were removed from the specification file (see #196). conda-lock will now always replace the old locked dependencies for a given platform with freshly-generated locked dependencies. In cases where the user requests a lock for a subset of platforms, those platforms not requested for lock will be persisted.
That last sentence is I think a good candidate for removal or revision.
@mfisher87, would it be accurate to say:
In most cases, the new behavior of locking with
conda-lock
is equivalent torm conda-lock.yml && conda-lock
. (The exception is when locking a platform is skipped due to the an unchanged content hash.)
There's another exception: locking a subset of platforms. For example, if you've previously locked with an environment.yml
that specifies 3 platforms, then you re-lock with -p linux-64
, the other two platforms will be persisted from the original lockfile, and linux-64
will be overwritten completely with the new lock results.
In most cases, the new behavior of locking with
conda-lock
is equivalent torm conda-lock.yml && conda-lock
. (The exception is when locking a platform is skipped due to explicit request or an unchanged content hash.)
Okay, new release is out. I went for a minor version since I think the only real "breakage" here would be more aggressive updating.
For the release notes, I wanted to avoid complication, and avoid discussion of the exceptional cases. (In case I wrote something wrong/misleading I can still edit it.)
Thanks everyone for all the great feedback and discussion!
There are some unaddressed points about implementation of a minimal update strategy. Let's open a fresh issue for that.
Would you mind updating the release notes to credit co-author @ArieKnoester for the PR? Thanks, Ben!
@mfisher87 oops, they're autogenerated and I missed that. Thanks, fixed!
I'm surprised that removing a dependency does not remove it and sub-dependencies from an existing
conda-lock.yml
(condalock 1.0.5):Steps to reproduce:
conda-lock lock --mamba -f environment.yml
with the followingenvironment.yml
:Comment or delete the
-pystac
dependency and re-runconda-lock lock --mamba -f environment.yml
, butpystac
remains inconda-lock.yml
:A workaround is to wipe
conda-lock.yml
and start fresh, but it's nice to keep it around for version control.