anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
221 stars 88 forks source link

Add command specific variables and define variable priority order #286

Closed AlbertDeFusco closed 3 years ago

AlbertDeFusco commented 4 years ago

There are now 5 places where environment variables can be set for use in anaconda-project run. The order listed here is highest priority first, meaning that variables set in any location override the same variable name if exists lower on the list.

  1. Command line invocation MY_VAR='on cli' anaconda-project run

  2. Shell export

    MY_VAR='exported'
    anaconda-project run
  3. Set within the anaconda-project.yml file defined for the command to be run

    name: project-with-vars
    packages:
      - python=3.7
    commands:
      default:
        variables:
          MY_VAR: 'default command'
        unix: env | grep MY_VAR
  4. Set within the anaconda-project.yml at the top level

    name: project-with-vars
    packages:
      - python=3.7
    variables:
        MY_VAR: 'project'
    commands:
      default:
        unix: env | grep MY_VAR
  5. Within the Conda environment after running anaconda-project prepare

Variable priority examples

The examples here are presented in reverse order from above to demonstrate the override. Each example builds from the previous state.

Conda env config vars

name: vars

packages:
- python=3.7
channels:
- defaults

commands:
  default:
    unix: env | grep MY_VARIABLE
> anaconda-project prepare
> conda env config vars set MY_VARIABLE='set in conda-meta/state' -p ./envs/default
> anaconda-project run default
MY_VARIABLE=set in conda-meta/state

Top-level project key

Adding variables to the project file overrides the Conda env.

name: vars

packages:
- python=3.7
channels:
- defaults

variables:
  MY_VARIABLE: 'set in project'

commands:
  default:
    unix: env | grep MY_VARIABLE
> anaconda-project run default
MY_VARIABLE=set in project

Command-specific variable

name: vars

packages:
- python=3.7
channels:
- defaults

variables:
  MY_VARIABLE: 'set in project'

commands:
  default:
    unix: env | grep MY_VARIABLE
  cmd:
    unix: env | grep MY_VARIABLE
    variables:
      MY_VARIABLE: 'set in command: cmd'
> anaconda-project run default
MY_VARIABLE=set in project
> anaconda-project run cmd
MY_VARIABLE=set in command: cmd

Exported in shell

> export MY_VARIABLE='exported'
> anaconda-project run default
MY_VARIABLE=exported
> anaconda-project run cmd
MY_VARIABLE=exported

Set on shell command

> MY_VARIABLE='on shell cmd' anaconda-project run default
MY_VARIABLE=on shell cmd
>MY_VARIABLE='on shell cmd' anaconda-project run cmd
MY_VARIABLE=on shell cmd
AlbertDeFusco commented 4 years ago

I have discovered that Using Conda activate.d does not work because Anaconda Project does not use the standard Conda activate mechanism

mcg1969 commented 3 years ago

@AlbertDeFusco should we fix this? That is to say, should we correct how anaconda-project does activation to either use conda's approach directly or reproduce the same steps?

AlbertDeFusco commented 3 years ago

Now that conda run in version 4.9 supports output streaming --no-capture-output I'd like to explore whether we could offload the work to conda run rather than replicating it.

mcg1969 commented 3 years ago

That's a good idea, but it raises an important question about how to handle older versions of conda (like those found in AE). As long as we don't break current behavior, but enable newer/better behavior for newer versions of conda, that would be fine.

AlbertDeFusco commented 3 years ago

Exactly! It seems like lack of support activate.d has not yet been called out as a bug, so I'm happy to leave this PR as-is and attack activate.d later and potentially only when using against newer conda versions.