anaconda / anaconda-project

Tool for encapsulating, running, and reproducing data science projects
https://anaconda-project.readthedocs.io/en/latest/
Other
219 stars 88 forks source link

[WIP] Utilize environment.yml or requirements.txt directly #275

Closed AlbertDeFusco closed 2 years ago

AlbertDeFusco commented 4 years ago

Marked as WIP to solicit feedback on the workflow. TODO:

This PR implements a minor change which enables the use of environment.yml or requirements.txt files directly without the need to create an anaconda-project.yml (nor will the file be created for you).

Enabled use cases

Note The run command can execute any executable in the environment and pass arguments to it. Commands need not be specified in an anaconda-project.yml file to be able to run.

In the two use cases shown below there is no anaconda-project.yml file and it will not be created with the commands shown. For both cases you can use anaconda-project prepare to create the file and import the required packages from either environmnent.yml or requirements.txt.

environment.yml

Here's a typical environment specification file.

name: envYaml

channels:
  - defaults
  - conda-forge

dependencies:
  - python=3.7
  - werkzeug=0.16
  - requests
  - pip:
    - tranquilizer

Create the environment at envs/envYaml where the name of the environment comes from the name: key.

> anaconda-project prepare
...
## Package Plan ##

  environment location: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/env-yml/envs/envYaml

  added / updated specs:
    - python=3.7
    - requests

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2020.1.1-0
  certifi            pkgs/main/osx-64::certifi-2020.4.5.2-py37_0
  cffi               pkgs/main/osx-64::cffi-1.14.0-py37hc512035_1
  chardet            pkgs/main/osx-64::chardet-3.0.4-py37_1003
  cryptography       pkgs/main/osx-64::cryptography-2.9.2-py37ha12b0ac_0
  idna               pkgs/main/noarch::idna-2.9-py_1
  libcxx             pkgs/main/osx-64::libcxx-10.0.0-1
  libedit            pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
  libffi             pkgs/main/osx-64::libffi-3.3-h0a44026_1
  ncurses            pkgs/main/osx-64::ncurses-6.2-h0a44026_1
  openssl            pkgs/main/osx-64::openssl-1.1.1g-h1de35cc_0
  pip                pkgs/main/osx-64::pip-20.0.2-py37_3
  pycparser          pkgs/main/noarch::pycparser-2.20-py_0
  pyopenssl          pkgs/main/osx-64::pyopenssl-19.1.0-py37_0
  pysocks            pkgs/main/osx-64::pysocks-1.7.1-py37_0
...
The project is ready to run commands.
Use `anaconda-project list-commands` to see what's available.

> conda list -p envs/envYaml
# packages in environment at /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/env-yml/envs/envYaml:
#
# Name                    Version                   Build  Channel
aniso8601                 8.0.0                    pypi_0    pypi
attrs                     19.3.0                   pypi_0    pypi
ca-certificates           2020.1.1                      0  
certifi                   2020.4.5.2               py37_0  
cffi                      1.14.0           py37hc512035_1  
chardet                   3.0.4                 py37_1003  
click                     7.1.2                    pypi_0    pypi
cryptography              2.9.2            py37ha12b0ac_0  
...

There are no commands

>anaconda-project list-commands
No commands found for project: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/env-yml

But, we can still run something

> anaconda-project run python --version
Python 3.7.7

> anaconda-project run tranquilizer cheese_shop.py --port 5000 
 * Serving Flask app "tranquilizer.application" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

Finally, after adding packages to the the environment.yml file they can be installed. (use the --refresh to completely rebuild the env, prepare will not remove packages)

# add numpy to the dependencies section using an editor

> anaconda-project prepare
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/env-yml/envs/envYaml

  added / updated specs:
    - numpy

requirements.txt

If there is a requirements.txt in the project directory (and no environment.yml) all packages listed will be installed as pip packages.

requests
tranquilizer==0.4.2

Running the prepare will first create a Conda environment with the most recent version of Python (3.8) and pip and then add the packages in the requirements.txt file.

>anaconda-project prepare
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/reqs/envs/default

  added / updated specs:
    - python

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2020.1.1-0
  certifi            pkgs/main/osx-64::certifi-2020.4.5.2-py38_0
  libcxx             pkgs/main/osx-64::libcxx-10.0.0-1
  libedit            pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
  libffi             pkgs/main/osx-64::libffi-3.3-h0a44026_1
  ncurses            pkgs/main/osx-64::ncurses-6.2-h0a44026_1
  openssl            pkgs/main/osx-64::openssl-1.1.1g-h1de35cc_0
  pip                pkgs/main/osx-64::pip-20.0.2-py38_3
  python             pkgs/main/osx-64::python-3.8.3-h26836e1_1

And confirm the pip packages were installed

>conda list -p envs/default
# packages in environment at /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/reqs/envs/default:
#
# Name                    Version                   Build  Channel
aniso8601                 8.0.0                    pypi_0    pypi
attrs                     19.3.0                   pypi_0    pypi
ca-certificates           2020.1.1                      0  
certifi                   2020.4.5.2               py38_0  
chardet                   3.0.4                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
flask                     1.1.2                    pypi_0    pypi
flask-restplus            0.13.0                   pypi_0    pypi

If you require a different version of Python it can be supplied during prepare

>anaconda-project prepare --python=3.6
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/reqs/envs/default

  added / updated specs:
    - python=3.6

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2020.1.1-0
  certifi            pkgs/main/osx-64::certifi-2020.4.5.2-py36_0
  libcxx             pkgs/main/osx-64::libcxx-10.0.0-1
  libedit            pkgs/main/osx-64::libedit-3.1.20181209-hb402a30_0
  libffi             pkgs/main/osx-64::libffi-3.3-h0a44026_1
  ncurses            pkgs/main/osx-64::ncurses-6.2-h0a44026_1
  openssl            pkgs/main/osx-64::openssl-1.1.1g-h1de35cc_0
  pip                pkgs/main/osx-64::pip-20.0.2-py36_3
  python             pkgs/main/osx-64::python-3.6.10-hf48f09d_2

Again, you can run any executable in the environment

>anaconda-project run tranquilizer --version
tranquilizer 0.4.2

Again, you can add packages to requirements.txt and install them with prepare

# edit requirements.txt to add pytest

> anaconda-project prepare
The project is ready to run commands.
Use `anaconda-project list-commands` to see what's available.

> conda list -p envs/default py
# packages in environment at /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/reqs/envs/default:
#
# Name                    Version                   Build  Channel
py                        1.8.2                    pypi_0    pypi
pyparsing                 2.4.7                    pypi_0    pypi
pyrsistent                0.16.0                   pypi_0    pypi
pytest                    5.4.3                    pypi_0    pypi
python                    3.6.10               hf48f09d_2  
python-dateutil           2.8.1                    pypi_0    pypi
pytz                      2020.1                   pypi_0    pypi
jbednar commented 4 years ago

Can you also support the case where there is an anaconda-project.yml file, but it doesn't describe the environment, only the commands? That way people could ship an environment.yml file to describe the environment for both conda and anaconda-project, but still be able to specify the commands to execute in that environment.

AlbertDeFusco commented 4 years ago

I believe that works since I've seen a hash in the anaconda-project.yml when using environment.yml. I'll look into that.

AlbertDeFusco commented 4 years ago

I've also got an issue to "rebuild" the anaconda-project.yml file from a live environment, which is tangentially related to this.

269

Also, the conda env export --from-history does not include pip packages, but should

AlbertDeFusco commented 4 years ago

@jbednar , yes you can use the anaconda-projec.yml file solely for commands and leave packages in the environment.yml or requirements.txt files.

I just need to push another commit turn off some prompts since it asks you to confirm including the packages defined in these auxiliary files before it runs.

jbednar commented 4 years ago

Great!

AlbertDeFusco commented 4 years ago

Fixed. You can provide an environment.yml file as shown above and specify only the commands in anaconda-project.yml. name: is required, but that can be relaxed. I've also enabled printing from pip installs.

environment.yml

name: envYaml
dependencies:
- python=3.7
- requests
- pip:
  - rope
  - tranquilizer
  - werkzeug==0.16
channels:
- defaults
- conda-forge

anaconda-project.yml

name: envyaml
commands:
  api:
    unix: tranquilizer cheese_shop.py {{'--port %s' % port if port is defined }}
    supports_http_options: false

run

As expected anaconda-project run will also prepare the environment.

>anaconda-project run api --anaconda-project-port 8080
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /Users/adefusco/Development/AnacondaPlatform/anaconda-project/examples/env-yml/envs/envYaml

  added / updated specs:
    - python=3.7
    - requests

...

Processing /Users/adefusco/Library/Caches/pip/wheels/63/15/08/3649f858dd9b9eab213430e368804bbee7340aa99fb34d661e/tranquilizer-0.5.0-py3-none-any.whl
Collecting werkzeug==0.16
  Using cached Werkzeug-0.16.0-py2.py3-none-any.whl (327 kB)
Processing /Users/adefusco/Library/Caches/pip/wheels/fc/68/52/627ca0d67f266c203ff5ef7e441036cf2049cdbb3e030c9e0a/rope-0.17.0-py3-none-any.whl
Collecting flask
  Using cached Flask-1.1.2-py2.py3-none-any.whl (94 kB)
Collecting python-dateutil

...

 * Serving Flask app "tranquilizer.application" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
jlstevens commented 4 years ago

This is great! I've been testing the 0.8.4+88 package with the workflows you describe above and here are my observations so far:

  1. As much as I love being able to type conda project there are some ways that make it obvious that anaconda-project is trying to integrate with conda rather than the other way around (possibly unavoidable unless things are also updated at the conda end?): a. If you just run conda, the help lists the available subcommands but this doesn't list project. b. If you run conda project, help output mentions anaconda-project again:

       conda project
       Must specify a subcommand.
       usage: anaconda-project [-h] [-v] [--verbose] ....

    On one hand, I really want to use the conda project command but on the other hand I am slightly worried that issues like these might cause confusion.

  2. Being able to use both the environment.yml (for the env) and anaconda-project.yml (for the commands) is probably the way I imagine using this feature the most. I actually like quite like having the name field tie the two files together (explicit is better than implicit, and I would consider making this a stricter requirement rather than dropping it).

  3. I did test the conda project run <executable> [<arg1>, <arg2>, ...] approach and it worked as expected though it makes me wonder whether this might get confused with conda run.

  4. I hadn't used the --refresh flag before so I am not quite sure this is the correct invocation. Nonetheless, the way the command fails suggests there is a real bug:

       anaconda-project prepare --refresh
       An unexpected error occurred, most likely a bug in anaconda-project.
           (The error was: KeyError: 'default')
       Details about the error were saved to /var/folders/xf/1pmxbzk97zzf1s6qrsh60pz00000gn/T/bug_details_anaconda-project_2020-07-30_6i2z1z40.txt

    The log is complaining about a KeyError due to 'default':

    more /var/folders/xf/1pmxbzk97zzf1s6qrsh60pz00000gn/T/bug_details_anaconda-project_2020-07-30_6i2z1z40.txt Bug details for anaconda-project error on 2020-07-30 sys.argv: ['/Users/jlstevens/miniconda3/envs/ae5tools/bin/anaconda-project', 'prepare', '--refresh'] {'version': '0.8.4+88.g06af3d0.dirty'} Traceback (most recent call last): File "/Users/jlstevens/miniconda3/envs/ae5tools/lib/python3.8/site-packages/anaconda_project/internal/cli/bug_handler.py", line 31, in handle_bugs return main_func() File "/Users/jlstevens/miniconda3/envs/ae5tools/lib/python3.8/site-packages/anaconda_project/internal/cli/main.py", line 398, in _main_without_bug_handler return _parse_args_and_run_subcommand(sys.argv) File "/Users/jlstevens/miniconda3/envs/ae5tools/lib/python3.8/site-packages/anaconda_project/internal/cli/main.py", line 390, in _parse_args_and_run_subcommand return args.main(args) File "/Users/jlstevens/miniconda3/envs/ae5tools/lib/python3.8/site-packages/anaconda_project/internal/cli/prepare.py", line 56, in main if prepare_command(args.directory, args.mode, args.env_spec, args.command, args.all, args.refresh, args.python): File "/Users/jlstevens/miniconda3/envs/ae5tools/lib/python3.8/site-packages/anaconda_project/internal/cli/prepare.py", line 45, in prepare_command _remove_env_path(project.env_specs[conda_environment].path(project.directory_path)) KeyError: 'default'
  5. Cloning a read-only env to add packages is an interesting concept but I doubt I would personally use it much as I only consider a fresh solve to be truly reproducible. That said, I can see how it is a time saver for anyone incrementally adding to a base environment.

    While I haven't tested this particular feature, looking at the example it looks like the clone env is made outside the project directory? If that is the case, I imagine that multiple levels of cloning won't be supported?

AlbertDeFusco commented 4 years ago

Thanks!

  1. Let's move the discussion of conda project to #284. I've copied your comments.

  2. So you'd like name: to match between environment.yml and anaconda-project.yml? Would requiring an env_spec: key in the command be more/less useful as well?

  3. Indeed this competes with conda run. I need to fill myself in on the latest developments with conda run.

  4. I've rarely used --refresh so I'll take a deeper look at this

  5. Let's move cloning discussion to #270.

jlstevens commented 4 years ago

Good point about env_spec. On one hand I like linking the files at the very top where name declared but that only works if there is only a single environment for the commands to use. That seems like a common case to me and having to always specify env_spec seems tedious.

If the name points to an environment.yml, can't that be the 'default' and then env_spec could be used to override? The problem there is that you can't really have two environment.yml files with different environment specifications in the same directory.

Maybe you would only be able to point to auxiliary environments specified in the anaconda-project.yml? Or what if env_spec could specify a .yml filename in the same directory as the anaconda-project.yml instead of just a key?

jbednar commented 4 years ago

I actually like quite like having the name field tie the two files together (explicit is better than implicit, and I would consider making this a stricter requirement rather than dropping it).

I don't like having the name required to match. I think the project name should come from the project file, while the environment need not have a name and if it does need not match. Shouldn't I be able to make 20 projects where 10 of them point to ../env1.yml and the other 10 point to ../env2.yml, so that I can maintain environments independently of choosing to use them in a particular project?

AlbertDeFusco commented 2 years ago

Closing this to continue the effort at #363. All comments here will be considered.