Caching a conda environment

lminer commented 3 years ago

It would be great if it were possible to cache a conda environment. I see from here that it is possible for a vanilla python environment.

bollwyvl commented 3 years ago

It is feasible to do so, but native support is unlikely to be added to this action.

My recommendation would be to:

try to restore a actions/cache of a conda-pack archive, hashed off your environment.yml
- if that hits an empty cache
- use setup-miniconda to make an environment from the environment.yml
  - make sure it has conda and conda-pack in it
- use conda-pack to make a relocatable archive of the environment before you do anything to it
  - like install your system-under-test
- if it succeeds
- unpack the conda-pack
- use setup-miniconda with $CONDA set to the unpacked env
- be fast

This approach avoids a number of gotchas with caching conda tarballs, etc.

lminer commented 3 years ago

Do you have any suggestions of an example I might look at for how to do this? My github-actions-fu is quite weak.

bitphage commented 3 years ago

Here is our example to cache /usr/share/miniconda/envs. Updated

      - uses: conda-incubator/setup-miniconda@v2
        with:
          activate-environment: "xxx"
          auto-activate-base: false
          use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!

      - name: Cache conda envs and other stuff
        id: conda
        uses: actions/cache@v2
        env:
          # Increase this value to manually reset cache if setup/environment-linux.yml has not changed
          CONDA_CACHE_NUMBER: 1
        with:
          path: |
            /usr/share/miniconda/envs/xxx
          key: ${{ runner.os }}-conda-${{ env.CONDA_CACHE_NUMBER }}-${{ hashFiles('setup/environment-template.yml', 'setup/*.sh') }}

      - name: Run install script
        # Only need to run install when deps has been changed
        if: steps.conda.outputs.cache-hit != 'true'
        run: |
          ./install     # <----- conda packages are installed here via `conda env update -f ...`

lminer commented 3 years ago

@bitphage it's failing for me right now at the last step. I don't have a complicated install process, so I just substituted ./install with conda env create -f environment.yml and I got the error:

Could not find conda environment: myenv
You can list all discoverable environments with `conda info --envs`.

This also happens if I do conda env update -f environment.yml. Any idea what I might be doing incorrectly?

bitphage commented 3 years ago

@lminer hmm, make sure that you have activate-environment: myenv in conda-incubator/setup-miniconda@v2 step.

lminer commented 3 years ago

@bitphage I have. This is what it looks like:

      - uses: conda-incubator/setup-miniconda@v2
        with:
          activate-environment: "myenv"
          auto-activate-base: false
          use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!

      # Remove envs directory if exists to prevent cache restore errors. Github runner already has bundled conda.
      - name: Remove envs directory
        run: rm -rf /usr/share/miniconda/envs

      - name: Cache conda envs and other stuff
        id: conda
        uses: actions/cache@v2
        env:
          # Increase this value to manually reset cache if setup/environment-linux.yml has not changed
          CONDA_CACHE_NUMBER: 1
        with:
          path: |
            ~/conda_pkgs_dir
            /usr/share/miniconda/envs
          key: ${{ runner.os }}-conda-${{ env.CONDA_CACHE_NUMBER }}-${{ hashFiles('environment.yml') }}

      - name: Run install script
        # Only need to run install when deps has been changed
        if: steps.conda.outputs.cache-hit != 'true'
        run: |
          conda env create -f environment.yml

bitphage commented 3 years ago

@lminer ok, I was trying to fix some issues after recent 2.1.0 release of setup-miniconda action. I've updated the example above. Note that there is no rm -rf step anymore and caching path should be /usr/share/miniconda/envs/myenv to avoid cache restore errors.

lminer commented 3 years ago

@bitphage thanks! it's working now. Just shaved 4 minutes off the runtime.

sam-hoffman commented 3 years ago

This example is helpful! I noticed it has one more step than the one in the README - is it recommended that everyone add the "Run install script" step? If so, could the example in the README be updated?

OlafHaag commented 3 years ago

My recommendation would be to:

* try to restore a `actions/cache` of a [`conda-pack`](https://conda.github.io/conda-pack/) archive, hashed off your `environment.yml`

  * if that hits an empty cache

    * use `setup-miniconda` to make an environment from the `environment.yml`

      * make sure it has `conda` and `conda-pack` in it
    * use `conda-pack` to make a relocatable archive of the environment _before_ you do anything to it

      * like install your system-under-test
  * if it succeeds

    * unpack the conda-pack
    * use `setup-miniconda` with `$CONDA` set to the unpacked env
    * be fast

This approach avoids a number of gotchas with caching conda tarballs, etc.

I tried this approach and this is the gist of what I ended up with so far, working:

name: Conda Environment Caching Example
on: workflow_dispatch

env:
  # Increase this value to reset cache if environment.yml has not changed.
  PY_CACHE_NUMBER: 0
  PY_ENV: my_env

jobs:
  setup-python:
    name: Setup Python Environment
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash -l {0}
    steps:
      - name: Git checkout
        uses: actions/checkout@v2
      - name: Cache Python environment
        id: cache-python
        uses: actions/cache@v2
        with:
          path: "${{ env.PY_ENV }}.tar.gz"
          key:
            ${{ runner.os }}-${{ env.PY_CACHE_NUMBER }}-${{ hashFiles('**/environment.yml') }}
      - name: Install Python dependencies
        if: steps.cache-python.outputs.cache-hit != 'true'
        uses: conda-incubator/setup-miniconda@v2
        with:
          miniforge-variant: Mambaforge
          use-mamba: true
          auto-update-conda: false
          activate-environment: ${{ env.PY_ENV }}
          environment-file: environment.yml
          auto-activate-base: false
      - name: Pack Python environment
        if: steps.cache-python.outputs.cache-hit != 'true'
        run: |
          conda pack --force -n ${{ env.PY_ENV }}

  use-cached-python:
    name: Use cached Python
    needs: [setup-python]
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash -l {0}
    steps:
      - name: Git checkout
        uses: actions/checkout@v2
      - name: Get Python cache
        id: python-cache
        uses: actions/cache@v2
        with:
          path: "${{ env.PY_ENV }}.tar.gz"
          key:
            ${{ runner.os }}-${{ env.PY_CACHE_NUMBER }}-${{ hashFiles('**/environment.yml') }}
      - name: Unpack Python environment
        run: |
          mkdir -p "${{ env.PY_ENV }}"
          tar -xzf "${{ env.PY_ENV }}.tar.gz" -C "${{ env.PY_ENV }}"
          source "${{ env.PY_ENV }}/bin/activate"
          conda-unpack
      - name: Run Python
        run: |
          source "${{ env.PY_ENV }}/bin/activate"
          python -c 'import sys; print(sys.version_info[:])'

In my setup I use different jobs using the same Python environment, that's why I separated the setup from the execution. Using conda-pack you'll have to use the same OS in each job that uses the cache. In my environment.yml I have added conda and conda-pack, and in channels I only have conda-forge.

In the second job I first tried using setup-miniconda after unpacking with:

      - name: Activate Python environment
        uses: conda-incubator/setup-miniconda@v2
        env:
          CONDA: my_env
        with:
          activate-environment: ${{ env.PY_ENV }}
          auto-activate-base: false

but that didn't gave the result I expected. Instead it created a new environment in my_env/envs/my_env. I was looking to not having to source "${{ env.PY_ENV }}/bin/activate" in each step after unpacking.

hadim commented 3 years ago

use setup-miniconda with $CONDA set to the unpacked env

I am also trying to setup a GA with setup-miniconda and conda-pack but I don't get that part. Does someone have a quick snippet example?

miltondp commented 3 years ago

Hi all. I was interested in this as well, and I ended up with this Github Actions workflow based in part on @OlafHaag's, which is run in ubuntu, macOS and Windows. I share it here in case it's useful to others.

name: tests
on:
  push:
  pull_request:
    types: [opened, reopened]

env:
  # Increase this value to reset cache if environment.yml has not changed.
  PY_CACHE_NUMBER: 2
  PY_ENV: cm_gene_expr

jobs:
  pytest:
    name: Python tests
    runs-on: ${{ matrix.os }}
    strategy:
      max-parallel: 4
      fail-fast: false
      matrix:
        python-version: [3.9]
        os: [ubuntu-latest, macOS-latest, windows-latest]
    steps:
      - name: Checkout git repo
        uses: actions/checkout@v2
        with:
          lfs: false
      - name: Cache conda
        id: cache
        uses: actions/cache@v2
        with:
          path: "${{ env.PY_ENV }}.tar.gz"
          key: ${{ runner.os }}-${{ env.PY_CACHE_NUMBER }}-${{ hashFiles('environment/environment.yml') }}
      - name: Setup Miniconda
        if: steps.cache.outputs.cache-hit != 'true'
        uses: conda-incubator/setup-miniconda@v2
        with:
          miniconda-version: "latest"
          auto-update-conda: true
          activate-environment: ${{ env.PY_ENV }}
          channel-priority: strict
          environment-file: environment/environment.yml
          auto-activate-base: false
      - name: Conda-Pack
        if: steps.cache.outputs.cache-hit != 'true'
        shell: bash -l {0}
        run: |
          conda install --yes -c conda-forge conda-pack coverage
          conda pack -f -n ${{ env.PY_ENV }} -o "${{ env.PY_ENV }}.tar.gz"
      - name: Unpack environment
        shell: bash -l {0}
        run: |
          mkdir -p "${{ env.PY_ENV }}"
          tar -xzf "${{ env.PY_ENV }}.tar.gz" -C "${{ env.PY_ENV }}"
      - name: Setup data and run pytest (Windows systems)
        if: runner.os == 'Windows'
        env:
          PYTHONPATH: libs/
        run: |
          ${{ env.PY_ENV }}/python environment/scripts/setup_data.py --mode testing
          ${{ env.PY_ENV }}/python -m pytest -v -rs tests
      - name: Setup data and run pytest (non-Windows systems)
        if: runner.os != 'Windows'
        shell: bash
        env:
          PYTHONPATH: libs/
        run: |
          source ${{ env.PY_ENV }}/bin/activate
          conda-unpack

          python environment/scripts/setup_data.py --mode testing

          if [ "$RUNNER_OS" == "Linux" ]; then
            coverage run --source=libs/ -m pytest -v -rs tests
            coverage xml -o coverage.xml
          else
            pytest -v -rs tests
          fi
      - name: Codecov upload
        if: runner.os == 'Linux'
        uses: codecov/codecov-action@v2
        with:
          files: ./coverage.xml
          name: codecov-${{ matrix.os }}-python${{ matrix.python-version }}
          fail_ci_if_error: true
          verbose: true

conda-incubator / setup-miniconda

Caching a conda environment #155