eeholmes / earthdata-cloud-cookbook

A tutorial book of workflows for research using NASA EarthData in the Cloud created by the NASA-Openscapes team
https://nasa-openscapes.github.io/earthdata-cloud-cookbook
Other
1 stars 0 forks source link

conda.Dockerfile #3

Open eeholmes opened 6 months ago

eeholmes commented 6 months ago

@cboettig

I separated the GA workflows into a conda and venv one so each was only triggered when needed.

Probably would be good to change the Quarto workflows so they are not triggered whenever there is a change to the image building.

The image builds are taking 45 minutes. Ouch.

The conda image still won't spin up on Openscapes but I think something is up and since the tag is not changing, the hub is not using the new image? Something is up. I am trying a new tag.

eeholmes commented 6 months ago

This image works and but not many packages in environment.yml . https://github.com/nmfs-opensci/container-images/blob/main/images/jupyter-base-notebook/Dockerfile

# Python 3.11 in image
FROM quay.io/jupyter/base-notebook:2024-02-13

# Add the packages you want to environment.yml
# Adds to the base env so you do not need to activate
COPY environment.yml environment.yml
RUN conda env update --name base -f environment.yml && conda clean --all
eeholmes commented 6 months ago

@cboettig Here is how the Openscapes corn installs the env with mamba. As I recall their env installs quickly with mamba but takes forever with conda

https://github.com/NASA-Openscapes/corn/blob/802abfdcf5dacc808e322fc84145424a3fe1810f/ci/install-kernels.sh#L16-L18

    conda-lock lock --mamba -f environment.yml -p linux-64 &&
    mamba create --name ${CONDA_ENV} --file conda-linux-64.lock  \

I don't know why they create the conda-lock file.

Here is where mamba is installed in the pangeo base image used by the Openscapes python image (corn)

https://github.com/pangeo-data/pangeo-docker-images/blob/45572b778e82faf0661c622d65e21fcb2538084a/base-image/Dockerfile#L69-L80

Looks like conda-lock is installed.

RUN echo "Installing Mambaforge..." \
    && URL="https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh" \
    && wget --quiet ${URL} -O installer.sh \
    && /bin/bash installer.sh -u -b -p ${CONDA_DIR} \
    && rm installer.sh \
    && mamba install conda-lock -y \
    && mamba clean -afy \
    # After installing the packages, we cleanup some unnecessary files
    # to try reduce image size - see https://jcristharif.com/conda-docker-tips.html
    # Although we explicitly do *not* delete .pyc files, as that seems to slow down startup
    # quite a bit unfortunately - see https://github.com/2i2c-org/infrastructure/issues/2047
    && find ${CONDA_DIR} -follow -type f -name '*.a' -delete
eeholmes commented 6 months ago

some other things are getting installed from the apt.txt file via the pangeo base image https://github.com/NASA-Openscapes/corn/blob/main/ci/apt.txt hmm which are needed? Def some.

https://github.com/pangeo-data/pangeo-docker-images/blob/45572b778e82faf0661c622d65e21fcb2538084a/base-image/Dockerfile#L120-L129

cboettig commented 6 months ago

nice work doing the separate GH-Action builds, that makes sense! Yeah, compiling GDAL entirely from source takes over 30 min and adds ~ 3GB to the image. I skipped that in the conda.Dockerfile, looks like it finishes in about 12 minutes on your GH-Action.

ghcr.io/eeholmes/earthdata-cloud-cookbook/cookbook-conda:latest is working for me now on the openscapes 2i2c hub. The first time it was slow to start up since it had to pull fresh but now it starts up pretty quickly.

image

recall that there should be no need to call mamba anymore . if you look at conda info you see that the default conda solver in miniforge is now actually mamba. Prior to Sept 2023 that wasn't true and we all needed to use mamba explicitly.

![image](https://github.com/eeholmes/earthdata-cloud-cookbook/assets/222586/a7bac847-f180-4086-87cd-db729215db7c)