conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
143 stars 46 forks source link

Reconsider how editing environments works #886

Open krassowski opened 5 hours ago

krassowski commented 5 hours ago

Context

Currently editing environment:

This means that autoreloading does not work. For example, when using with Jupyter/IPython:

If instead it worked like:

Value and/or benefit

Many minutes to hours in productivity gained (or rather not lost) for the use case of interactive environment creation by a senior data analyst.

Anything else?

No response

krassowski commented 4 hours ago

@kcpevey mentioned to me that this may be a foot gun for shared environments:

The problem with autoreloading the environment is that the environment can change underneath you - other people could have updated the environment without your knowledge.

I somewhat agree, but ultimately if shared env is changed by someone else, activating it after the change will cause the same issue.

And questions the UX for user awareness:

What if you are running a notebook, stop to kick off a rebuild the env which takes 20 minutes, while that's going you keep working in the notebook. At some point, the env build is complete - What happens to your running notebook? The kernel remains as the old env until you restart the kernel? The user gets a warning that the kernel has been replaced?

Here I would mention that auto-reloading is not enabled by default, and users who enable it know what they are doing. Also, rebuilding and env should not take 20 minutes (but it does). I do however, agree that a notification that an environment building has completed should be shown when conda-store is used with JupyterLab, which is tracked in:

dharhas commented 4 hours ago

As an fyi, historically, updating in place rather than rebuilding from scratch has been a really bad idea and has ended up with folks having non-reproducible bespoke / broken environments because to recreate the environment you have to recreate every update step and that is not tracked anywhere.

dharhas commented 4 hours ago

But this does go with another discussion I had had about the packaging at pycon, we actually have multiple target audiences (devs, end users etc) for environment management and we are using the same tools for all of them.

dharhas commented 3 hours ago

newly installed packages do not become available in a running kernel which means that possibly hours of computation may be lost as the kernel needs to be restarted to pickup smallest change in the env

Is this actually a valid use case? How reliably does it work? For pure python packages maybe. To me it seems if you change the underlying environment all bets are off on whether your python objects are even valid if an install changed something under the hood. Seems like a better option would be to make sure you serialized your results.