Closed oshadura closed 9 months ago
Can I suggest a slightly different directory structure?
renamed: alma8/Dockerfile -> alma8.dockerfile
renamed: alma8/environment.yaml -> environment.yaml
renamed: ubuntu/Dockerfile -> ubuntu.dockerfile
deleted: ubuntu/environment.yaml
so that we have a common environment.yaml
for all OS versions? The CI file will need a tiny change as well
@nsmith- sure! I will try tomorrow before you are awake. Any other suggestions? Maybe we need some other flavor? What should we do with ML libraries? Add additional environment.yaml
?
Regarding image dependencies and naming, personally I'm in favor of something like
and potentially other AF variants. @btovar @mapsacosta @mcremone @rpsimeon34 may have opinions as well. I don't see a strong case for having multiple base OS images. Maybe I am missing something but since our ~whole stack is coming from conda the base OS is somewhat irrelevant as far as I can tell?
This does leave up in the air where the ML packages go in the stack. Part of me thinks it should be AF-dependent, so that AFs that support GPUs can ensure they have the correct CUDA versions.
This may also be of interest to @kondratyevd
Thanks @nsmith- - at Purdue AF we provide only a single Docker image for users, and manage environments using conda, so probably at the moment this is not super relevant for us, unless I'm missing some important caveats
at Purdue AF we provide only a single Docker image for users
can you describe/link that image? That implies users' full conda environments must be shipped to workers?
The image is here: https://github.com/PurdueAF/purdue-af/blob/master/jupyterhub/docker/cmsaf-alma8/Dockerfile
To use conda envs with Dask, users must place their environments into a shared storage volume (Purdue's "Depot" storage is standard work area for local Purdue users, it is mounted to all worker nodes; and we are working on a similar solution for external users). Then, the path to conda env must be specified in Dask Gateway setup.
For local Python imports it's similar - user's files should be in shared storage and PYTHONPATH can be modified in Dask Gateway setup to point to the user's analysis framework.
Thanks for the input @kondratyevd, it is also in line with some discussion I had with Burt about the fact that the user server (notebook/lab) image often needs extensive AF customization and essentially they are only interested in conda install coffea
or using a common environment.yaml
file for optional extras (most notably, xrootd). This makes me wonder if we are better off supplying just a set of conda environment files? For workers an image may be useful because it can be cached more effectively.
Doesn’t this (set of conda files) already mostly provided with just a few optional dependencies? I think our users didn’t have any issues with just conda install coffea==0.7.21
and similarly for v2023/2024. Xrootd is the only exception that I remember.
@nsmith- @lgray Now only one combination is failing: python 3.11 + coffea 0.7.22 with:
2024-02-15 16:26:20,860 - distributed.worker - WARNING - Compute Failed
Key: MyProcessor-31e87f80bfa2fb4d21e30f667186a569
Function: MyProcessor
args: ((WorkItem(dataset='dimuon', filename='https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dimuon.root', treename='Events', entrystart=0, entrystop=40, fileuuid=b'\xa2\x10\xa3\xf86H\x11\xea\xa2\x9f\xf5\xb5\\\x90\xbe\xef', usermeta={}), b'\x04"M\x18h@-\x00\x00\x00\x00\x00\x00\x00v,\x00\x00\x00R\x80\x05\x95"\x00\x01\x00\xf0\x13\x8c\x0btest_dimuon\x94\x8c\x0bMyProcessor\x94\x93\x94)\x81\x94.\x00\x00\x00\x00'))
kwargs: {}
Exception: 'TypeError("__class__ assignment: \'NanoEventsArray\' object layout differs from \'Array\'")'
https://github.com/CoffeaTeam/af-images/actions/runs/7918852148/job/21618260586
Otherwise, I can disable it for now, and then if everything else is ok, we can finally push these images.
I am going to fix the failing image in the next pull request as well as add other combinations. The goal is to test the first batch of images on coffee-casa already today.
Thanks @nsmith- for starting point: https://github.com/CoffeaTeam/docker-coffea-base/blob/coffea2023
The goal is to merge https://github.com/CoffeaTeam/docker-coffea-base and https://github.com/CoffeaTeam/docker-coffea-dask repositories and have the possibility to maintain both coffea versions separately (CalVer and legacy 0.7.x) as separate images.
I propose to name
coffeateam/coffea-basev0
as legacy 0.7.x andcoffeateam/coffea-dask
the calver version. I added also two versions of each image, depending on linux distro:coffeateam/coffea-basev0-ubuntu
,coffeateam/coffea-basev0-alma8
andcoffeateam/coffea-dask-ubuntu
,coffeateam/coffea-dask-alma8
.The next step would be to slim the image and add ML learning libraries as separate image (?).