eWaterCycle / ewatercycle

Python package for running hydrological models
https://ewatercycle.readthedocs.io/en/latest/
Apache License 2.0
34 stars 5 forks source link

Container-in-a-container issues #434

Open samharrison7 opened 4 months ago

samharrison7 commented 4 months ago

Hi all,

Thought I would open up an issue to discuss the container-in-a-container issues that we in UKCEH (@CansuUluseker and @mjhollaway) are having.

Our goal is to use eWaterCycle within a DataLab project, but this is generally relevant to any system built on containers.

Suggestions for us to focus on after our latest meeting:

Did I forget anything important or get any of that wrong?

BSchilperoort commented 4 months ago

Hi Sam,

This seems pretty complete. I expect no issues with the local python models, so that should at least help you run a model with forcing generated using ESMValTool.

Investigate whether giving Apptainer setuid privileges is an option within DataLabs

Yes, I hope that will work (not certain though). Otherwise, getting things to work would be become more complex, as explained in that Kubernetes issue.

Daafip commented 4 months ago

Could also try this related example: https://github.com/Daafip/ewatercycle-hbv

The docs still need to be improved & updated to explain how the local model can be used for HBV. But to run HBV local it essentially is:

Daafip commented 4 months ago

The docs still need to be improved & updated

The current doc's contain an updated example!

samharrison7 commented 1 month ago

Hey all,

We managed to get the local version of the models running and have confirmed these issues are container-in-a-container issues.

Our DataLabs developers have suggested using Podman instead of Docker/Apptainer. Is that something you guys have ever experimented with, and do you think it might be a way forward? Would it need modification to eWaterCycle itself, e.g. to deal with config options specifically for Podman?

Using setuid to propagate root access could be another option but this is less desirable due to potential security considerations.

Cheers, Sam

BSchilperoort commented 1 month ago

Hi Sam, @sverhoeven has at some point been interested into using Podman, however that might not have gone anywhere due to our infrastructure provider not currently supporting it.

We would probably have to write some code specific to Podman in grpc4bmi, just as we have for Docker and Apptainer. If we're lucky it's just writing very similar code for the Podman API instead of the Docker API.

You can see some of the code here. It starts up a container, and bind mounts the right directories to it, maintaining the original folder structure. This is so a user can pass the path to a configuration file using BMI.initialize(), without having to modify this string (or the config) behind the scenes.

samharrison7 commented 1 month ago

Thanks Bart. Just having a quick look at the Podman Python docs and the API doesn't look too different to Docker's, though the devil is probably in the detail. I guess there would need to be some new code on the eWaterCycle side too (e.g. here)?

BSchilperoort commented 1 month ago

Thanks Bart. Just having a quick look at the Podman Python docs and the API doesn't look too different to Docker's, though the devil is probably in the detail.

Yeah, it seems straightforward but there will probably be some issues that are difficult to predict.

I guess there would need to be some new code on the eWaterCycle side too (e.g. here)?

Yes, that would be step two. The first step is to make grpc4bmi work with podman. Then you should be able to spawn a new container and connect to it, like https://grpc4bmi.readthedocs.io/en/latest/container/usage.html#using-the-container-clients

BSchilperoort commented 1 month ago

I did find this guide where a rootless podman can run a rootless podman:

$ podman run -it --security-opt label=disable --user podman --device /dev/fuse quay.io/podman/stable podman run alpine echo hello

This could be a good starting point to try to run a containerized model from inside podman (without writing any code for ewatercycle/grpc4bmi). E.g.:

podman run --security-opt label=disable --user podman --device /dev/fuse quay.io/podman/stable /bin/bash will start podman and connect to an interactive bash shell. Instead of quay.io/podman/stable you could use an image that also has a python environment with ewatercycle installed, and has some pre-generated forcing files + config.

Next you can start a grpc4bmi server in headless mode: podman run -d ghcr.io/daafip/hbv-bmi-grpc4bmi:v1.5.0 (and also mount volumes, of course)

Then you should be able to open up python, connect to the running grpc4bmi server, and try to initialize the model :crossed_fingers:

samharrison7 commented 1 month ago

Oh nice, that sounds positive! We'll give that a go and see how far we get. I guess there isn't already an image with eWaterCycle installed available anywhere is there?

BSchilperoort commented 1 month ago

I guess there isn't already an image with eWaterCycle installed available anywhere is there?

Try the following :nerd_face: Still have to build it locally.

Details

Dockerfile (started from the podman container): ```Dockerfile FROM quay.io/podman/stable RUN mkdir -p ~/miniconda3 RUN curl -o ~/miniconda3/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh RUN bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 RUN rm ~/miniconda3/miniconda.sh RUN source ~/miniconda3/bin/activate RUN curl -o conda-lock.yml https://raw.githubusercontent.com/eWaterCycle/ewatercycle/main/conda-lock.yml RUN source ~/miniconda3/bin/activate; conda install mamba conda-lock -n base -c conda-forge -y RUN source ~/miniconda3/bin/activate; conda-lock install --no-dev -n ewatercycle RUN source ~/miniconda3/bin/activate; conda activate ewatercycle; pip install ewatercycle ``` To build and run: ``` docker build -t podman-ewc . docker run -it podman-ewc source ~/miniconda3/bin/activate; conda activate ewatercycle python import ewatercycle ```

Could be more efficient (without the repeated source declarations) but it does work

BSchilperoort commented 3 days ago

@CansuUluseker & @mjhollaway I have managed to get a rootless podman container to sucessfully run a grpc4bmi model.

The info and Dockerfiles are all here: https://github.com/eWaterCycle/nested-podman The containers are hosted on docker hub so it should be easy to pull and run.

a todo for this repository is to support the podman Python SDK https://podman-py.readthedocs.io, however it seems it's basically a drop-in replacement of the Docker SDK so it shouldn't be too much work.

sverhoeven commented 3 days ago

We are using https://pypi.org/project/docker/ to interact with docker, it talks to a docker deamon so for podman we need a podman socket or switch to podman-py