conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
456 stars 101 forks source link

conda-lock doesn't seem to be updating content hashes when a lock file already exists #617

Closed jeffseif closed 3 months ago

jeffseif commented 3 months ago

Checklist

What happened?

The issue

I use conda-lock with the --check-input-hash flag so that my system can skip locking when the dependencies haven't changed. My understanding is that this behavior relies on the content_hash stored within the lock file -- when that matches the hash derived from the environment.yaml locking can be skipped. And when I change the dependencies within my environment.yaml, new hashes will be derived, re-locking will occur, and the content hashes will be written to the lock file. This has been working great with v2.0.0!

However, upon upgrading to v2.5.3 this behavior no longer seems to work. Specifically, new hashes don't seem to be written into the lock file when a lock file already exists. The result is that the hashes stored within the lock file will no longer match the hash of the environment.yaml and conda-lock will re-lock dependencies no matter what.

Steps to reproduce

First I create a simple lock file:

cat environment.yaml ;
channels:
  - conda-forge

dependencies:
  - requests

platforms:
  - linux-64
conda-lock lock --check-input-hash --file=environment.yaml --lockfile=new-conda-lock.yml ;
Locking dependencies for ['linux-64']...
INFO:conda_lock.conda_solver:linux-64 using specs ['requests']
 - Install lock using: conda-lock install --name YOURENV new-conda-lock.yml

When the environment.yaml's hash matches that within the lock file, re-locking is skipped as desired:

grep -A1 content_hash new-conda-lock.yml ;
  content_hash:
    linux-64: 1870375197ff7a472d3df4e8033a75f34f5118e228315f5c0c095b78147cd9a5
conda-lock lock --check-input-hash --file=environment.yaml --lockfile=new-conda-lock.yml
Spec hash already locked for ['linux-64']. Skipping solve.

When we change the environment.yaml, the hashes don't match and re-locking occurs as desired:

cat environment.yaml
channels:
  - conda-forge

dependencies:
  - requests
  - tabulate

platforms:
  - linux-64
> conda-lock lock --check-input-hash --file=environment.yaml --lockfile=new-conda-lock.yml
Locking dependencies for ['linux-64']...
INFO:conda_lock.conda_solver:linux-64 using specs ['requests', 'tabulate']
 - Install lock using: conda-lock install --name YOURENV new-conda-lock.yml

However, the lock file's hash has not been updated:

grep -A1 content_hash new-conda-lock.yml
  content_hash:
    linux-64: 1870375197ff7a472d3df4e8033a75f34f5118e228315f5c0c095b78147cd9a5

And as a result, the environment.yaml's hash doesn't match that within the lock file and conda-lock will re-lock (and still not update the hash):

conda-lock lock --check-input-hash --file=environment.yaml --lockfile=new-conda-lock.yml
Locking dependencies for ['linux-64']...
INFO:conda_lock.conda_solver:linux-64 using specs ['requests', 'tabulate']
 - Install lock using: conda-lock install --name YOURENV new-conda-lock.yml
grep -A1 content_hash new-conda-lock.yml
  content_hash:
    linux-64: 1870375197ff7a472d3df4e8033a75f34f5118e228315f5c0c095b78147cd9a5

A possible location for the bug

Looking at the changes between v2.0.0 and v2.5.3, I noticed that the logic for merging an existing lock file with the new content was rewritten, adding new_lock_content = lock_content_to_persist.merge(fresh_lock_content). Diving into Lockfile.merge, we find that it returns Lockfile(..., metadata=other.metadata | self.metadata); which in turn returns LockMeta(..., content_hash={**self.content_hash, **other.content_hash}).

I think the issue is that other.metadata | self.metadata is equivalent to other.metadata.__or__(self.metadata) so that when we are inside of LockMeta.__or__, self and other are flipped and we are overriding the new (=self) content hashes with the old (=other) content hashes.

I think the fix would be to swap other and self on this line. However, I'm not certain what side effects this might have elsewhere in the code base.

Conda Info

python3 -m venv .venv ;
. .venv/bin/activate ;
python -m pip install -e . --quiet ;
pip check ;
No broken requirements found.
python --version ;
Python 3.10.12
git rev-parse HEAD ;
4448144ca1a8a361d909d98c24511d85c45892fd
conda-lock --version ;
conda-lock, version 2.5.6.dev6+g4448144

Conda Config

No response

Conda list

No response

Additional Context

No response

maresb commented 3 months ago

Many thanks for doing all the diagnostic work on this! I think you are probably correct. Would you like to open a PR with your proposed fix?

jeffseif commented 3 months ago

Many thanks for doing all the diagnostic work on this! I think you are probably correct. Would you like to open a PR with your proposed fix?

Yeah, I'm happy to put a PR together. I'm not familiar with the testing setup for this repo, so I'll try to stand up a new test which fails, then make the change I suggested, then see if that test succeeds and no others fail.

And I'll shout if I run into any issues!

jeffseif commented 3 months ago

@maresb The test setup was smoother than I expected, so I have a PR together here: https://github.com/conda/conda-lock/pull/618

jeffseif commented 3 months ago

Addressed by https://github.com/conda/conda-lock/pull/618 and available in https://github.com/conda/conda-lock/releases/tag/v2.5.6

maresb commented 3 months ago

Thanks a lot @jeffseif! Also available on PyPI, and will be available on conda-forge as soon as the CDN refreshes.