Lockfiles - Githubissues

jmarshrossney commented 1 month ago

Lockfiles contain a list of all the dependencies, both direct and indirect, of a package, pinned to exact versions. They are necessary though not sufficient for fully reproducible environments.

Another advantage of creating an environment from a lockfile is that you skip the often slow solving step, which is particularly annoying when you have to do it multiple times because one of the packages has introduced a bug (e.g. in a recent update) which you haven't yet spotted.

Obviously if you're developing a package that you want to support over a wide range of package versions then you might not be so interested in avoiding these kinds of problems, but I think we are more interested in experimenting with the science right now so I don't immediately see a downside of locking our dependencies.

I would have liked to introduce conda-lock in #13 but unfortunately this does not seem possible while we depend on plankton-cefas-scivision.

Another option is abandoning conda and using pip to install everything, but I don't expect that to be popular, nor am I really pushing it.

One of the more likely ways out of this is that we no longer depend on plankton-cefas-scivision, e.g. if we were to train our own model or if Turing come out with a new offering.

jmarshrossney commented 1 month ago

Fun fact I just discovered: In principle we can specify all dependencies in pyproject.toml and from this create both conda and venv virtual environments with help from conda-lock.

metazool commented 1 month ago

This is a useful fun fact that would have immediate wider value! The current minimal approach to pyproject.toml originally came from this post

Even if the model is likely to be deprecated, bound to be similar scenarios occurring. The other project where there are choice of conda / venv, environment management issues and a clean recommendation would be immediately relevant, is building on open-cd...

mattjbr123 commented 4 weeks ago

The current minimal approach to pyproject.toml originally came from this post

Very useful little blog this! Thanks for sharing as always :)

mattjbr123 commented 4 weeks ago

If I've understood what conda-lock is doing correctly, I used to do something similar with conda list --explicit > envfile.yml which would create an environment file (essentially a list of urls) which you could pass to other people to replicate your environment, and I think it just installed everything in the file without using the solver. conda create --name envname --file envfile.yml (or something like that)

jmarshrossney commented 3 weeks ago

Hey @mattjbr123 that sounds entirely reasonable and probably works fine in 99% of cases, but I don't think it's entirely bullet proof.

I could be wrong wouldn't expect it to skip the solve step, cos there's still a chance the file was created from a broken environment and how would it know without solving the environment first to check?

Apart from that, the issue with conda list --explicit is that it doesn't include pip installed dependencies.

I'm definitely a fan of what conda is trying to do with standardising environments but comments like this one make me want to disengage!

jmarshrossney commented 2 weeks ago

Reading this comment by one of the (main?) conda-lock maintainers.

The main things relevant to us are:

conda-lock seems like it might be struggling under its maintenance burden, and the fact that its maintainers are discussing endorsing another project seems like something to pay attention to.
The project they are considering endorsing is called pixi, which looks and feels very similar to poetry, including native lockfile support, but uses conda environments under the hood so can deal with non-Python dependencies.
pixi is a substitute for the entire conda workflow (conda activate, conda install etc), kind of like how poetry replaces the venv/pip workflow while using both tools under the hood.

So I'm planning to keep an eye on this and will report back!

jmarshrossney commented 2 weeks ago

I tried to incorporate conda-lock into this project, see #26 .

I've found it ok in the distant past when I only used conda and almost never pip, but in this case it has been quite frustrating and ultimately the single-sourcing idea failed.

I'm inclined to keep using simple python-specific tools/lockfiles and just accept that environments are non-reproducible at the level of cuda etc.

metazool commented 2 days ago

The extended self-dialogue in the comments on #26 offers an interesting learning experience that others won't have to go through! I suggest we close this as a wont-fix

NERC-CEH / plankton_ml

Lockfiles #14