anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
221 stars 88 forks source link

Support for read-only environments #299

Closed AlbertDeFusco closed 3 years ago

AlbertDeFusco commented 3 years ago

to utilize read-only environments place the local envs directory (aliased as :) first and the read-only envs directory second.

export ANACONDA_PROJECT_ENVS_PATH=:/path/to/readonly/envs
anaconda-project ...

For all project actions the ANACONDA_PROJECT_ENVS_PATH paths are searched backwards

  1. If a matching env_spec name is found the env is scanned for deviations
  2. if there are deviations and the env is unfixable (read-only) the next ANACONDA_PROJECT_ENVS_PATH is checked
  3. if no env_spec is found the env will be created in the local envs directory

TODO:

mcg1969 commented 3 years ago

Albert, my concern here is that it's all or nothing. That is, if there are any deviations in a read-only environment, it forces you to start from scratch.

jbednar commented 3 years ago

if there are any deviations in a read-only environment, it forces you to start from scratch.

I agree that's not desirable, but is there any alternative that can be done at the anaconda-project level? Are you proposing nested conda environments?

mcg1969 commented 3 years ago

I agree that's not desirable, but is there any alternative that can be done at the anaconda-project level? Are you proposing nested conda environments?

@jbednar in the context of Anaconda Enterprise this is where I'm landing for now: when you build a project, you can specify your environment in one of three ways.

  1. Legacy environments (/opt/continuum/anaconda/envs/*). These remain read-write. However, they remain on the ephemeral Docker layer, which means that if the session is restarted, your environment will have to be re-prepared, just like it is now. And any conda installs you did without changing anaconda-platform.yml will be lost.
  2. Read-only environments. These environments can be freely used by projects but with the understanding that they cannot be modified. If your anaconda-project.yml spec attempts to modify them, the preparation stage will just fail. On the other hand, the prep stage will be over quick either way. Ideally, we can move most of our older environments over to read-only.
  3. Persistent environments. This is what we recommend for most people moving forward. If you give your environment spec a name that does not match our legacy or read-only environments, then anaconda-project.yml will create the conda environment for you in persistent storage. It might take a little longer than if it were hosted in the ephemeral Docker layer, because it's shared storage. However, once it's there, it doesn't go away, so session restarts are super fast.

Honestly, I think this isn't a bad story, really. And we might be able to find a way to make it easy for people to "copy" the spec for a built-in environment into their project, rename it, and prepare it.

jbednar commented 3 years ago

Ok, sounds good. I thought you were proposing having a local read-write envt where a few packages would shadow/override packages in a separate read-only envt, which I think was once possible with nested Conda envts but was always deeply confusing. Those supported options all sound good!

In the context of people who move projects between Anaconda Enterprise and separate archival, testing, or deployment systems, I do have a question about how any of these environments interact with the anaconda-project.yml file. Previously, one of the legacy envts could be referenced by name, without the .yml file (and thus the exported project) including any specification for what's in that envt. In such a case, the project won't run or will run differently outside of AE than in it. Regardless of which of the above three envt options is chosen, I'd like there to be a way that the .yml file could be explicit about the contents of that envt, so that the project will run the same (apart from speed) both in and out of AE. Such portability has always been difficult, but I consider it an important goal that can determine how one goes about referring to external environments.

mcg1969 commented 3 years ago

It is true that an empty environment spec is sufficient for pre-baked and read-only environments, which could lead to poor reproducibility discipline. So we will have to coach people on that.

Even for persistent environments, there is not a lot of urgency around keeping anaconda-platform.yml up to date. So we will have to coach people. One incentive will be that sessions and deployments will not share persistent environments. So the anaconda-platform.yml will have to correctly render the desired environment even if it persists between restarts of the deployment.

We will benefit tremendously from some sort of "anaconda-platform sync" command that either constructs a minimal environment spec by pruning the dependency tree, or from the conda history, or by analyzing imports, or some combination thereof.

AlbertDeFusco commented 3 years ago

I've uploaded a change that will run conda clone from a read-only env to a writable path in ANACONDA_PROJECT_ENVS_PATH.

Have I got this correct?

mcg1969 commented 3 years ago

This is great. We can guarantee this will work if the read-only volume has a properly populated package cache for its environments

AlbertDeFusco commented 3 years ago

All tests have passed for current functionality, but I do not yet have unit tests for read-only envs. If you wish to merge that's fine and I can continue to develop the tests.

I have validated this feature in the following way

setup ro env

# create
conda create -y -p ./ro_envs/py38 python=3.8

# readonly
chmod -R 555 ./ro_envs/py38
chmod -w ./ro_envs

the project file

the project file lives in a directory called proj, which is a sibling of ro_envs

name: readonly

packages:
- python=3.8

commands:
  default:
    unix: python -c 'import sys;print(sys.prefix)'

env_specs:
  py38: {}
channels: []

execution

The project yaml file will execute against the read-only env as is.

> ANACONDA_PROJECT_ENVS_PATH=:/path/to/ro_envs anaconda-project run
/Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/read-only/ro_envs/py38

Attempting to adjust the package list will force a clone before adding the package

> ANACONDA_PROJECT_ENVS_PATH=:/path/to/ro_envs anaconda-project add-packages requests
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/read-only/proj/envs/py38

  added / updated specs:
    - requests

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cryptography-3.3           |   py38hbcfaee0_0         555 KB
    ------------------------------------------------------------
                                           Total:         555 KB

The following NEW packages will be INSTALLED:

  brotlipy           pkgs/main/osx-64::brotlipy-0.7.0-py38h9ed2024_1003
  cffi               pkgs/main/osx-64::cffi-1.14.4-py38h2125817_0
  chardet            pkgs/main/osx-64::chardet-3.0.4-py38hecd8cb5_1003
  cryptography       pkgs/main/osx-64::cryptography-3.3-py38hbcfaee0_0
  idna               pkgs/main/noarch::idna-2.10-py_0
  pycparser          pkgs/main/noarch::pycparser-2.20-py_2
  pyopenssl          pkgs/main/noarch::pyopenssl-20.0.0-pyhd3eb1b0_1
  pysocks            pkgs/main/osx-64::pysocks-1.7.1-py38_1
  requests           pkgs/main/noarch::requests-2.25.0-pyhd3eb1b0_0
  six                pkgs/main/osx-64::six-1.15.0-py38hecd8cb5_0
  urllib3            pkgs/main/noarch::urllib3-1.25.11-py_0

Downloading and Extracting Packages
cryptography-3.3     | 555 KB    | ########## | 100% 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Using Conda environment /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/read-only/proj/envs/py38.
Added packages to project file: requests.

afterwards the local envs clone will be utilized since it is first in the path list.

> ANACONDA_PROJECT_ENVS_PATH=:/path/to/ro_envs anaconda-project run
/Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/read-only/proj/envs/py38
mcg1969 commented 3 years ago

I feel like we shouldn't merge until we can exercise the read-only envs support. But to be clear, THIS IS AWESOME. Nice work.

mcg1969 commented 3 years ago

I think we need a way to enable/disable the cloning behavior. In some cases, we might want the prepare step to fail if the specifications don't match the environment.

AlbertDeFusco commented 3 years ago

Would you propose that the $ENV_PREFIX/.readonly file fill that purpose? Such that if the .readonly file is not present then let anaconda-project fail?

mcg1969 commented 3 years ago

I don't see the .readonly flag as fulfilling this purpose, no. I think its only purpose should be to engage anaconda-project's readonly behavior, whatever that behavior is.

I guess my concern is that there may be some contexts or some situations where the user will want the prepare step to fail on a readonly environment that is out of compliance. I could see, for instance, a situation where a company running AE5 will require certain deployments to use a fixed environment, and it's important that this isn't accidentally bypassed by cloning and modifying.

mcg1969 commented 3 years ago

@AlbertDeFusco I haven't had time to dig into this, and I need to get back to the persistent session work, so I'm going to tag 0.9.0 where we are now... We can go to 0.10.0 when we get this working.

AlbertDeFusco commented 3 years ago

Agreed. I will spend any time I can spare on moving the clone operation to a better place.

AlbertDeFusco commented 3 years ago

@mcg1969, I'm looking at where I put the clone command and I want to move it out to fix_environment_deviations

However, I think having a separate read-only paths env var would help me. It would allow me to better control how EnvSpec::path behaves and secondly if this new read-only env var is not set then that would indicate to anaconda-project that unfixbable envs (readonly) should cause errors rather than make clones.

Would this be acceptable?

AlbertDeFusco commented 3 years ago

Hmm, maybe that will add some more problems as well. I at least want to move the clone out to the fix function and I'll keep looking at it.

AlbertDeFusco commented 3 years ago

The latest commits add an environment variable called ANACONDA_PROJECT_READONLY_ENVS_POLICY. When this is set to clone anaconda-project will clone a readonly env to a writable path if it needs to make modifications. If the var is unset or set to anything else (I'd recommend setting fail) then anaconda-project will fail when attempting to modify a readonly env.

Tests have been added for this behavior.

mcg1969 commented 3 years ago

So so awesome

AlbertDeFusco commented 3 years ago

I just need to get the new windows tests working (read/write permission issue) and it's ready to got

mcg1969 commented 3 years ago

I haven't looked yet, so you may have already done this, but can I trouble you to add the documentation? That new section I created would be the perfect place

AlbertDeFusco commented 3 years ago

Already on it.