ContinuumIO / anaconda-issues

Anaconda issue tracking
646 stars 220 forks source link

feature request: save environments with only user requested packages and not their dependencies #546

Open electronwill opened 8 years ago

electronwill commented 8 years ago

New feature request: It would be nice to be able to save an environment with only its user requested packages and without also listing their dependencies. This would lead to simpler environment files which are easier to read and edit, and would also often make the environment file more portable across platforms. The information in '$PREFIX/conda-meta/history' might suffice, or conda might need to tag each package with a boolean variable indicating whether it was requested by a user or installed as a dependency of a package requested by a user.

jni commented 8 years ago

Thanks for this report!

robintw commented 8 years ago

Has there been any progress with this?

And, if not, what is the current recommended best-practice? Manually maintaining a separate environment.yml file for each platform?

electronwill commented 8 years ago

I've contacted the devs about a way to use conda to create a "request list" with only the packages that a user requested, and not the packages that were pulled in as dependencies of those. I'll reply here and at conda issue 1033 sometime this week when they get back to me.

Without a request list, you could either have an automatically generated file for each platform, or one hand edited file for all platforms. I usually use an automatically generated file for each platform, because it's quite fast and easy to generate without hand editing, it's more specific, and it makes the results a little more reproducible, but in some cases the convenience of using a single file for all platforms may outweigh that.

A request list will only be helpful for some environments. For any environment, we can ask if all its user requested packages are cross platform, and if all its other packages are cross platform. The answers to these two questions divide environments into four categories: no/no, no/yes, yes/no, and yes/yes. A request list would help in the yes/no category but not the other three.

Here's an example of a hand edited file. On OS X I ran:

conda create -n env1 scipy
source activate env1
conda env export > environment.yml

This was the output:

name: env1
dependencies:
- mkl=11.3.3=0
- numpy=1.11.0=py35_1
- openssl=1.0.2h=1
- pip=8.1.2=py35_0
- python=3.5.1=0
- readline=6.2=2
- scipy=0.17.1=np111py35_0
- setuptools=22.0.5=py35_0
- sqlite=3.13.0=0
- tk=8.5.18=0
- wheel=0.29.0=py35_0
- xz=5.0.5=1
- zlib=1.2.8=3

I simplified it to this environment.yml file:

name: env1
dependencies:
- scipy
swolebro commented 7 years ago

I'd also like to voice my support for isolating explicitly installed packages from the dependencies thereof.

As an example of where something like this existing in-the-wild: in Gentoo Linux (ie. Linux for the OCD), the Portage package manager keeps a list of explicitly requested packages in /var/lib/portage/world. With this list, you can effectively reinstall the same set of tools on any other Gentoo box, even when the full list of dependencies might be different (eg. because of an architecture change). So, an world file is like an env export, where the box itself becomes your environment.

In the specific case of the Gentoo world file, they also omit version information, because their policy is to always use the latest stable that satisfies your depgraph. That can also change the set of underlying dependencies that get installed, though in theory, the behavior of the tools in your world remains the same (with the caveat of new versions deprecating things).

So, I can imagine something similar with conda, where if you conda install scipy, then export the env, you get the short YAML from above (which will install the latest possible), whereas if you conda install scipy==0.17.1, you get an env export that enforces that version, and only if you explicitly install each dependency do you get them all in your env export. This would essentially fix conda/conda#1033. As it stands, if I install anything that requires libgfortran on Linux, my env exports won't work for any of my Windows coworkers. Right now, I use manual curation of the file, which works, but is a hassle and error prone, especially as you later add (or especially remove!) more packages.

@electronwill mentions the data in $PREFIX/conda-meta/history, and I'd like to point out while that's close, it doesn't seem to keep any record of which environment you were in when you called conda. Apart from that gotcha, you could theoretically scrape the file for the info (and I used to do the exact same thing for apt-get logs, before switching to Gentoo), but first-party support would be much nicer.