InseeFrLab / images-datascience

Collection of Docker images to build the data science catalog of the Onyxia project
MIT License
24 stars 22 forks source link

Using a custom image in pod setup #228

Open fBedecarrats opened 6 days ago

fBedecarrats commented 6 days ago

The current r-datascience image includes old GDAL and PROJ versions. @avouacr indicated a way to install updated ones at startup (#215), but I see that the process is very long and sometimes fail. For the training I am facilitating I would need a more straightforward way to have an environment that work for my purpose. There is a custom image that fits my need available at ghcr.io/mapme-initiative/mapme-spatial:1.4.0 (source Dockerfile accessible here), but when I include in the pod setup as follows: image the pod launch fails at all attempt. Am I doing something wrong when trying to use a custom image?

avouacr commented 6 days ago

Hi @fBedecarrats, I understand your need for more recent versions of geospatial libraries. However, I don't think we want to integrate this upgrade process directly in our images. The philosophy of the images-datascience project is to provide users with images that fit most datascience purposes while building reliably each week. Having the latest versions of geospatial libraries is an advanced use, and would both add maintenance costs on our side and risk reducing the stability of the pipeline in comparison to relying on rocker's build scripts.

That being said, this is something you can definitely build for yourself in order to tailor our services to your needs. if the procedure I suggested in #215 takes too long / is not reliable enough in the context of an init script, then the relevant solution is likely to use a custom image, as you suggested. You won't be able to use any custom image from other producers because our helm-charts rely on precise configuration found in our images. I would encourage you to build your own custom image that inherits ours, tweak the environment to fit your needs, test the image in our services like you tried with the mapme-spatial one, and then make your trainees use this custom service (which can be provided as a single custom URL precising the custom image).

So in practice, you would follow a procedure that looks somewhat like this :

A Dockerfile that would look something like this :

FROM inseefrlab/onyxia-r-datascience:r4.4.1

# Some shell code that upgrades the relevant geospatial libraries, and potentially install additional R packages
# See #215 for inspiration

A .github/workflows/build.yaml with this content, adapted to the name of your DH repo of course, and with relevant token provided as secrets on the GH repository.

So quite a simple process all things considered, which with a litte bit of trial and error would lead you to a custom image fitting your needs, and directly usable as custom image in our services.

Good luck!

fBedecarrats commented 5 days ago

Thanks a lot for the feedback @avouacr ! I refined the init script and I have the impression that it works better now. I will try to do it with this init script today as I have to rush for the training now and I'll try to develop the Dockerfile tonight. Thanks again!