My research lab uses a central install of anaconda, and we all use Jupyter notebooks + nb_conda, which is great! However, we have an issue with trying to balance two things:
Keeping each notebook fully reproducible (optimally, the conda environment for each notebook remains static forever)
Limiting the number of separate conda environments (we don't want a huge list of environments to select from, and a huge number of files on the file system)
We haven't been able to figure out a good way of doing this. Everyone needs different software (or versions) for each project, and when new software is installed into existing conda environments, this can change the environment and "break" old notebooks that used that conda environment.
We've been dealing with this issue by just listing the software in the conda environment when the notebook was run. More specifically, we use sessionInfo() for R kernels or ! conda list -n MY_NOTEBOOK_CONDA_ENV for python kernels. This is not optimal, given that we would still need to create a new conda environment to re-run these notebooks, which inflates the total number of conda environments.
I really like how snakemakehandles conda environments, where it creates specific conda environments for specific rules, and only if those rules are used. I'm wondering if this could be implemented in nb_conda for better reproducibility of Jupyter notebooks. The user could provide a yaml-formatted list of conda packages (possibly in the notebook metadata), then a temporary conda environment is created just for running the notebook. Once the user shuts down the kernel, the conda environment is also removed (unless the user wants to keep it for faster re-running later).
My research lab uses a central install of anaconda, and we all use Jupyter notebooks + nb_conda, which is great! However, we have an issue with trying to balance two things:
We haven't been able to figure out a good way of doing this. Everyone needs different software (or versions) for each project, and when new software is installed into existing conda environments, this can change the environment and "break" old notebooks that used that conda environment.
We've been dealing with this issue by just listing the software in the conda environment when the notebook was run. More specifically, we use
sessionInfo()
for R kernels or! conda list -n MY_NOTEBOOK_CONDA_ENV
for python kernels. This is not optimal, given that we would still need to create a new conda environment to re-run these notebooks, which inflates the total number of conda environments.I really like how
snakemake
handles conda environments, where it creates specific conda environments for specific rules, and only if those rules are used. I'm wondering if this could be implemented innb_conda
for better reproducibility of Jupyter notebooks. The user could provide a yaml-formatted list of conda packages (possibly in the notebook metadata), then a temporary conda environment is created just for running the notebook. Once the user shuts down the kernel, the conda environment is also removed (unless the user wants to keep it for faster re-running later).Is this idea feasible?