anaconda / nb_conda

Conda environment and package access extension from within Jupyter
BSD 3-Clause "New" or "Revised" License
140 stars 32 forks source link

temporary conda environments for each notebook #65

Open nick-youngblut opened 6 years ago

nick-youngblut commented 6 years ago

My research lab uses a central install of anaconda, and we all use Jupyter notebooks + nb_conda, which is great! However, we have an issue with trying to balance two things:

  1. Keeping each notebook fully reproducible (optimally, the conda environment for each notebook remains static forever)
  2. Limiting the number of separate conda environments (we don't want a huge list of environments to select from, and a huge number of files on the file system)

We haven't been able to figure out a good way of doing this. Everyone needs different software (or versions) for each project, and when new software is installed into existing conda environments, this can change the environment and "break" old notebooks that used that conda environment.

We've been dealing with this issue by just listing the software in the conda environment when the notebook was run. More specifically, we use sessionInfo() for R kernels or ! conda list -n MY_NOTEBOOK_CONDA_ENV for python kernels. This is not optimal, given that we would still need to create a new conda environment to re-run these notebooks, which inflates the total number of conda environments.

I really like how snakemake handles conda environments, where it creates specific conda environments for specific rules, and only if those rules are used. I'm wondering if this could be implemented in nb_conda for better reproducibility of Jupyter notebooks. The user could provide a yaml-formatted list of conda packages (possibly in the notebook metadata), then a temporary conda environment is created just for running the notebook. Once the user shuts down the kernel, the conda environment is also removed (unless the user wants to keep it for faster re-running later).

Is this idea feasible?