More than one environment file

tdegeus commented 3 years ago

Suppose that I split the required environment in two:

Dependencies at runtime

channels:
  - conda-forge
dependencies:
  - cmake

Extra dependencies for testing

channels:
  - conda-forge
dependencies:
  - catch2

What should my action look like? How do I modify e.g.:

    - name: Set conda environment "test"
      uses: conda-incubator/setup-miniconda@v2
      with:
        mamba-version: "*"
        channels: conda-forge,defaults
        channel-priority: true
        environment-file: environment.yaml
        activate-environment: test
        auto-activate-base: false

?

Thanks!

jaimergp commented 3 years ago

AFAIK you can only specify one environment file. A workaround right now would be to add a second step right after the action with:

- shell: bash -l {0}
  run: |
    conda env update -n test -f your_second_env.yml

If we were to implement this, I guess the syntax would need to be something like:

    - name: Set conda environment "test"
      uses: conda-incubator/setup-miniconda@v2
      with:
        mamba-version: "*"
        channels: conda-forge,defaults
        channel-priority: true
        environment-file: environment1.yaml,environment2.yml,environment3.yml
        activate-environment: test
        auto-activate-base: false

With the updates being done in that order.

tdegeus commented 3 years ago

Thanks @jaimergp . Personally I would find it nice to be able to specify more than one environment file, and the syntax you propose seems to make a lot of sense.

bollwyvl commented 3 years ago

This logic exists, but maybe isn't released, over in conda-lock:

https://github.com/conda-incubator/conda-lock/pull/57

one option, once that feature is live, would be to install conda-lock and just use that to generate a full solve on disk, then conda create with that.

It's nuts to try to safely merge conflicting channel lists, so I'm pretty sure last in wins... But you really want it to be able to overwrite packagespecs, including all the packagespec edge cases (channel priority). So reusing something else is good.

Further, we could cache that lock file, or at least emit the place where a user could cache it. For big environments with many channels, this is a huge part of the time.

bollwyvl commented 3 years ago

Also, regarding syntax: it's crazypants we don't (can't?) use arrays in our action inputs. I'd propose being able to use \n as a delimiter, if we must use strings:

    - name: Set conda environment "test"
      uses: conda-incubator/setup-miniconda@v2
      with:
        mamba-version: "*"
        channels: |
          conda-forge
          defaults
        channel-priority: true
        environment-file: |
          environment1.yaml
          environment2.yml
          environment3.yml
        activate-environment: test
        auto-activate-base: false

bollwyvl commented 3 years ago

Yeah, strings only: https://github.community/t/can-action-inputs-be-arrays/16457/2

bollwyvl commented 3 years ago

Also, regarding the mutliple channels: perhaps the right answer is to demand that channels is set in the action.

Similarly, just to make it explicit, demand that activate-environment is set, as well, and force that name.

goanpeca commented 3 years ago

I'd propose being able to use \n as a delimiter, if we must use strings:

I propose we use both 🤷🏽 , either ',' or '\n' :)

tdegeus commented 3 years ago

In practise, especially with mamba, this addition would make jobs with more than one environment-file also significantly faster!

hadim commented 3 years ago

How do you deal when the same dep is defined multiple times? Does the order of the files matter?

It's not exactly the same but I am also looking for a general way to "alter" an env file. It would be nice to come with something general enough to be integrated upstream to conda. See https://github.com/conda/conda/issues/10398

bollwyvl commented 3 years ago

on conda-lock, the order matters, first-in-first-out. I'm think they have re-implemented the conda matchspec parsing (so conda is not a dependency), and merge based on the canonical package name that pops out of that.

Channels are another matter altogether, and are handled out-of-band, as it's easy to make irreconcilable channel priority lists. In my experience, this is cleanest with matrix/include, e.g.

matrix:
  os: [linux, macos, windows]
  python-version: [3.6, 3.8]
  include:
    - os: linux
      channels: conda-forge
    - os: macos
      channels: conda-forge
    - os: windows
      channels: conda-forge,msys2
steps:
- use: setup-miniconda
  with:
    channels: ${{ matrix.channels }}

Having done this a couple of times, I've been toying with the idea of a utility that does offline conda-lock solving, constructing the needed arguments directly from...

a declarative CI matrix (each provider has their own quirks)
a directory of env.yml that can (optionally) add their own requirements

...that:

generate a "final" environment.yml
all the lock files

And check all that in. Being able to look back at exactly what went into the mill on a given run has been helpful for pinning down when things go wrong, while having it disappear into the output of conda-create or conda-env update makes forensic analysis challenging. With the lockfiles in place, a local reproduction of the exact environment, as it failed, is trivial.

Having this directly in conda/mamba would be... interesting, i guess, but keeping the scope of CONDA_EXE, whoever provides it, being "reads and writes canonical files" and "speaks canonical CLI" is probably the most important thing to me rather than bells and whistles that (even for a while) don't work between multiple implementations. Put differently: I don't think magic comment selectors, jinja variables, or anything like that will help the maintainability of environments, and would absolutely make it harder to maintain this repository.

conda-incubator / setup-miniconda

More than one environment file #105