con / tinuum

Resources (later might be a tool) to create reproducible computational environments
Apache License 2.0
0 stars 0 forks source link

conda/conda-forge #2

Open yarikoptic opened 2 months ago

yarikoptic commented 2 months ago

Responses/hints got so far on gitter (thanks!)

@bollwyvl

there was a hosted one at some point, but it was not feasible to maintain

one could theoreticallty do such a thing by manipulating a local cache of repodata, but has gotten harder with recent changes (jlap, etc)

@jaimergp

Check https://github.com/conda/conda-libmamba-solver/blob/main/tests/repodata_time_machine.py. With CEP-15 this will be better supported once implemented. For now I think it's only valid to generate --dry-run --json payloads which you could turn into a @EXPLICIT style lockfile.

matrss commented 2 months ago

I found https://pixi.sh to be the best way to get a reproducible environment based on the conda-forge package ecosystem. It is a relatively new package manager (released in summer last year), but it is developed by some of the same people that are also behind mamba and it works pretty well already.

I just have some issues with the way conda-forge generally handles packages, e.g. that it is impossible to test them due to install-time dependency resolution.

yarikoptic commented 2 months ago

that it is impossible to test them due to install-time dependency resolution

could you elaborate more? AFAIK other distributions (debian, pip) all have "install-time dependency resolution" since there are always "choices". FWIW we do test (unittests) conda builds of datalad etc in conda-forge.

matrss commented 2 months ago

So there is this software https://github.com/Open-MSS/MSS that I contribute to a bit. It had a release in October 2023. At that time pint=0.22 was the latest version of pint, one of its dependencies. The conda-forge feedstock of MSS doesn't run the test suite, but if it did it would have passed.

Then in December 2023 pint=0.23 was released. This version broke some of MSS' tests. Since conda/mamba do install-time dependency resolution any installation after the new release of pint was broken, without the MSS package being rebuild (to catch the issue in its test step, if it had one).

The best thing we can do is regularly install the package in a fresh environment and test that it still works. But, to my knowledge, there is no way to prevent such breakage of the package as distributed by conda-forge. This means conda-forge packages are effectively untestable.

I've ranted a bit about this as I discovered it here: https://github.com/conda-forge/mss-feedstock/issues/162

The fix to this problem is to make dependency resolution purely a build-time step. If a packages dependencies change in any way then this can affect the behaviour of the package, so it must be rebuilt and re-tested, before this new version of the package can be distributed. This is what https://nixos.org/ and https://guix.gnu.org/ do.

Following this model there are no choices to be made at install-time. Any difference in the "inputs" to a package (and that could just be using a different compiler flag to compile a dependency of a dependency) would lead to a different build result because, well, it could behave differently.

Nix calls this "purely functional package management", and its approach is described in more detail here: https://nixos.org/guides/how-nix-works/.