conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
491 stars 103 forks source link

Removing dependencies from an existing lock file #196

Closed scottyhq closed 1 year ago

scottyhq commented 2 years ago

I'm surprised that removing a dependency does not remove it and sub-dependencies from an existing conda-lock.yml (condalock 1.0.5):

Steps to reproduce:

  1. conda-lock lock --mamba -f environment.yml with the following environment.yml:

    channels:
    - conda-forge
    dependencies:
    - python=3.9
    - pystac
    platforms:
    - linux-64
  2. Comment or delete the -pystac dependency and re-run conda-lock lock --mamba -f environment.yml, but pystac remains in conda-lock.yml:

    name: pystac
    url: https://conda.anaconda.org/conda-forge/noarch/pystac-1.4.0-pyhd8ed1ab_0.tar.bz2

A workaround is to wipe conda-lock.yml and start fresh, but it's nice to keep it around for version control.

riccardoporreca commented 2 years ago

Not removing obsolete dependencies is indeed problematic and probably not a desirable behavior. Even an upgrade of an existing dependency might leave behind former transitive dependencies that might eventually become incompatible with the rest of the current (and updated) dependencies.

weiji14 commented 1 year ago

Just checking to see if this issue has been resolved in conda-lock>=2? Or do we still need to do the rm conda-lock.yml && conda-lock lock --mamba -f environment.yml workaround?

maresb commented 1 year ago

As far as I know this is unfortunately not yet resolved.

mfisher87 commented 1 year ago

Today I stumbled on a nasty consequence of this behavior. In my case, I had a dependency in the pip dependency subsection of my environment.yml, and that dependency became recently available on conda-forge. So naturally, I moved it out of the pip section and into the main dependencies list and ran conda-lock.

The result was that the same dependency was listed twice in conda-lock.yml, once for conda and once for pip. Of course, the pip version overwrote the conda version in the last stage of the install. This resulted in a broken environment with incompatible dependencies, even though our freshly-locked environment.yml defined a compatible environment spec.

I created a repository that walks through a simplified version of what I experienced, without the dependency conflicts: https://github.com/mfisher87/sscce-conda-lock-pip-ghosts

The real-world scenario I encountered was that we upgraded an old environment.yml file containing sphinx and autodoc-pydantic dependencies, and moved autodoc-pydantic out of the pip section at the same time. autodoc-pydantic==1.5.1, incompatible with recent versions of sphinx, was silently overwriting the "correct" version even though our environment file that our lock file was based on only contained the spec autodoc-pydantic==1.9.0. We fixed it by deleting conda-lock.yml and re-locking, but up until figuring out what was going on I was very confused :laughing:

mfisher87 commented 1 year ago

Given this behavior can lead to envs that don't reproduce as expected, could this issue be pinned to help confused folks like myself find it?

mfisher87 commented 1 year ago

Thinking about digging in to this as an exercise this weekend. From an outside perspective, without having any design context, I'd expect conda-lock to replace the previous lockfile wholesale every time a solve is completed successfully. Are there reasons not to do that, or would there be major challenges in implementing such behavior?

Thanks all for being welcoming to questions! :heart:

mfisher87 commented 1 year ago

We took a stab at this. I feel like multiple sources (which I don't fully understand the use case for, and haven't found docs for yet) may be a confounding factor for our solution. Our naive understanding is that we need to retain support for per-platform updates, so we can't simply blow away the config file and drop in a new one. Draft PR coming soon :)

maresb commented 1 year ago

Sorry for my lack of responsiveness this past weekend.

FWIW I don't understand or like the current behavior, and consequently I always delete the conda-lock.yml before relocking, which I find awkward.

I'd like to see things done differently, and would be supportive of doing a major release to change this behavior.

@mariusvniekerk, are there any good reasons for keeping the current behavior which I should know about? Also @bollwyvl?

bollwyvl commented 1 year ago

I hadn't noticed these issues, mostly as I don't use the .yml format, and even then i still delete my .lock (or, according to constructor, now, .txt) files before resolving. So I can't really comment on how the existing strategy works, but just some thoughts:

In some other lockfile-based systems i've used, once a named entry appears in the lock, it will never get updated unless something new comes along that can be met by the existing entry, and all unused entries will be removed at the end. If the .yml format allows for partial locking, I could see how this could get very complicated. This is part of the reason I just use the @EXPLICIT files, as when i reach for conda-lock i almost always have multiple environments to loop/matrix over something anyway.

In conda(-forge)-land, generally when I relock, I want the freshest packages that I haven't pinned, and with mutable upstream state like repodata-patches, keeping old, unexamined, transient entries might be actively harmful, if a key dependency has changed (see pydantic<2, urllib<3 all over in the last few months).

mfisher87 commented 1 year ago

Sorry for my lack of responsiveness this past weekend.

Never a problem in my mind! Hope you got away from work for a bit :)

FWIW I don't understand or like the current behavior, and consequently I always delete the conda-lock.yml before relocking, which I find awkward.

I'd like to see things done differently, and would be supportive of doing a major release to change this behavior.

Yes, once I discovered this behavior, I realized I should be doing the same... but I don't want to :laughing: Maybe a --merge/--no-merge flag or something could be useful to ensure that absolutely none of the existing lock content is considered. And maybe it should default to --no-merge if we're looking at a major release. But that may require some more refactoring to enable --update behavior to continue to work as expected.

once a named entry appears in the lock, it will never get updated unless something new comes along that ~can~ can not be met by the existing entry

I'm not sure I understand this as written; should "can" actually be "can not"? I.e. once an entry, e.g. foo=1.2.3 shows up in the lock file, it will never get updated until another entry is added that is incompatible, e.g. bar=2.3.4, which requires foo>1?

AlbertDeFusco commented 1 year ago

I'll add a little of my experience and expectations here. First, I agree with the growing sentiment that extraneous packages could be removed from the lock and I've also gotten in the habit of rm conda-lock.yml && conda-lock lock. To me this sounds much like conda env update --prune, which was recently fixed for classic solver.

There is one wrinkle that might be worth considering that a user brought to my attention: relock an env spec with minimal amount of changes. I wonder if there might be a way to have both a minimal update along with pruning of orphaned and un-requested packages.

bollwyvl commented 1 year ago

--prune

I've seen some lockfile tools that offer a [--strategy=latest] argument... seems like having a countable number of these wrapping existing conda/mamba arguments could be very useful.

A super useful example, which may or may not be supported by the solvers, would be a --strategy=oldest, for building oldest compatible package set meeting the given specs. This would allow for using a single (set of) environment.yml file(s) to generate a (set of) lockfile(s) that accurately reflect both the latest and oldest, both of which took into account things like repodata-patches. I recently had to do something similar to bisect an upstream for a repodata patch, but ended up falling back to the upstream package manager (pip) for the nitty-gritty.

both a minimal update along with pruning of orphaned

As for combining multiple strategy solves of in a single .yml: again, I have no skin in the game, but seems extra-super complicated.

mfisher87 commented 1 year ago

@maresb can we mark this as resolved now that #485 is merged? Or should we wait for next release?

cc @weiji14 :wave: :)

maresb commented 1 year ago

If you could write a brief release note let's just do a release now.

mfisher87 commented 1 year ago

Sure, will do my best! This is a tough one to boil down :)

Resolved an issue causing dependencies to persist in the lock file even if they were removed from the specification file (see #196).

Or a more low-level explanation, please cut up or edit however you like:

conda-lock no longer merges a pre-existing lockfile with freshly-generated locked dependencies. This was causing dependencies to persist in the lock file even if they were removed from the specification file (see #196). conda-lock will now always replace the old locked dependencies for a given platform with freshly-generated locked dependencies. In cases where the user requests a lock for a subset of platforms, those platforms not requested for lock will be persisted.

That last sentence is I think a good candidate for removal or revision.

maresb commented 1 year ago

@mfisher87, would it be accurate to say:

In most cases, the new behavior of locking with conda-lock is equivalent to rm conda-lock.yml && conda-lock. (The exception is when locking a platform is skipped due to the an unchanged content hash.)

mfisher87 commented 1 year ago

There's another exception: locking a subset of platforms. For example, if you've previously locked with an environment.yml that specifies 3 platforms, then you re-lock with -p linux-64, the other two platforms will be persisted from the original lockfile, and linux-64 will be overwritten completely with the new lock results.

In most cases, the new behavior of locking with conda-lock is equivalent to rm conda-lock.yml && conda-lock. (The exception is when locking a platform is skipped due to explicit request or an unchanged content hash.)

maresb commented 1 year ago

Okay, new release is out. I went for a minor version since I think the only real "breakage" here would be more aggressive updating.

For the release notes, I wanted to avoid complication, and avoid discussion of the exceptional cases. (In case I wrote something wrong/misleading I can still edit it.)

Thanks everyone for all the great feedback and discussion!

There are some unaddressed points about implementation of a minimal update strategy. Let's open a fresh issue for that.

mfisher87 commented 1 year ago

Would you mind updating the release notes to credit co-author @ArieKnoester for the PR? Thanks, Ben!

maresb commented 1 year ago

@mfisher87 oops, they're autogenerated and I missed that. Thanks, fixed!