Closed AlbertDeFusco closed 3 years ago
@mcg1969 , I've been running some experiments on this and I have developed two approaches that I will explain here using conda commands. I believe either one can be incorporated into anaconda-project. A comparison between these two approaches is given at the end.
These two solutions produce identical environments and have nearly identical times for this example.
Let's create a read-only conda environment.
# create and unpin history
conda create -y --offline -n readme python=3.6 anaconda=5.0.1
cat /dev/null > ~/Applications/miniconda3/envs/readme/conda-meta/history
# set as readonly
chmod -R 555 ~/Applications/miniconda3/envs/readme
One approach is to utilize conda create --clone
only when anaconda-project detects that a change has been requested to the read-only env and then continue to apply package changes. Otherwise, anaconda-project commands like run
will work directly on the read-only env.
conda create -p ./envs/clone --clone ~/Applications/miniconda3/envs/readme
cat /dev/null > ./envs/clone/conda-meta/history
conda install --offline -y -p ./envs/clone pandas=1 hvplot
An alternative is, again only when a change to the env is requested, to perform a dry-run install of the requested packages and rebuild an environment spec file using the original package list from the read-only env.
First prepare JSON files for 1) the original package list and 2) the changes required (both add and remove).
conda list --json -n readme > readme.json
conda install --offline -n readme --dry-run --json pandas=1 hvplot > update.json
Now we need to reconstruct the the environment using these two JSON files.
import json
import sys
readonly_json = sys.argv[1]
dryrun_json = sys.argv[2]
def pkg_version_build(d):
return f"{d['name']}={d['version']}={d['build_string']}"
with open(readonly_json) as f:
readonly = json.load(f)
with open(dryrun_json) as f:
dryrun = json.load(f)
original = set(map(pkg_version_build, readonly))
to_remove = set(map(pkg_version_build, dryrun['actions']['UNLINK']))
to_add = set(map(pkg_version_build, dryrun['actions']['LINK']))
final = (original - to_remove) | to_add
print('dependencies:')
print('\n'.join((f' - {p}' for p in final)))
And now create the local environment
python rebuild_spec.py readme.json update.json > local.yml
conda env create -f local.yml -p ./envs/local
Method | Time (m) | notes |
---|---|---|
Clone | 1.13 | There may be unintended consequences with clone (does it work with pip?). |
Rebuild | 1.1 | Requires two solves, however the second one should be much faster. The time to solution could be slower if the original package cache from the read-only env is not maintained. |
That's actually not bad. Thanks for being data driven.
The package cache requirement is the same for both—cloning actually requires repopulating the package cache.
So the advantage for the Rebuild approach is that it reduces the amount of package downloads. With a clone and install, you'll repopulate the package cache with old packages that don't end up in the new environment.
With #292 merged I believe we can support this directly in the path
function. The find_environment_deviations
can help.
What I'm working on in the path
function is that
$ANACONDA_PROJECT_ENVS_PATH
directories$ANACONDA_PROJECT_ENVS_PATH
directories for a writable env_spec$ANACONDA_PROJECT_ENVS_PATH
pathSo this would mean that you run anaconda-project as follows
export ANACONDA_PROJECT_ENVS_PATH=/path/to/readonly/envs:/path/to/writable/envs:
anaconda-project prepare
Does this match what you want to do?
Yes, I think that sounds right! Except that I think you have to put the writable environment directories first in priority, not last.
After all, suppose you do anaconda-project prepare
and you determine that changes need to be made to a read-only environment. So you create a new environment to host the changes.
But the way you've ordered the PATH, the next time you do anaconda-project prepare
it won't see that read-write environment.
looks done to me.
what if a pre-baked env needs to be modified but it read-only on disk?