Add coffea 2024 and coffea 0.7.x images

oshadura commented 10 months ago

Thanks @nsmith- for starting point: https://github.com/CoffeaTeam/docker-coffea-base/blob/coffea2023

The goal is to merge https://github.com/CoffeaTeam/docker-coffea-base and https://github.com/CoffeaTeam/docker-coffea-dask repositories and have the possibility to maintain both coffea versions separately (CalVer and legacy 0.7.x) as separate images.

I propose to name coffeateam/coffea-basev0 as legacy 0.7.x and coffeateam/coffea-dask the calver version. I added also two versions of each image, depending on linux distro: coffeateam/coffea-basev0-ubuntu, coffeateam/coffea-basev0-alma8 and coffeateam/coffea-dask-ubuntu, coffeateam/coffea-dask-alma8.

The next step would be to slim the image and add ML learning libraries as separate image (?).

nsmith- commented 10 months ago

Can I suggest a slightly different directory structure?

    renamed:    alma8/Dockerfile -> alma8.dockerfile
    renamed:    alma8/environment.yaml -> environment.yaml
    renamed:    ubuntu/Dockerfile -> ubuntu.dockerfile
    deleted:    ubuntu/environment.yaml

so that we have a common environment.yaml for all OS versions? The CI file will need a tiny change as well

oshadura commented 10 months ago

@nsmith- sure! I will try tomorrow before you are awake. Any other suggestions? Maybe we need some other flavor? What should we do with ML libraries? Add additional environment.yaml?

nsmith- commented 10 months ago

Regarding image dependencies and naming, personally I'm in favor of something like

coffea-base
- coffea calver
- dask (not distributed)
coffea-dask-v0
- coffea 0.7.x
- dask
- distributed
coffea-lpc (FROM coffea-base)
- distributed
- lpcjobqueue
coffea-eaf (FROM coffea-base)
- distributed
- other stuff for FNAL EAF?
coffea-casa (FROM coffea-base)
- distributed
- dask-gateway
- any extra casa stuff
coffea-casa-v0 (FROM coffea-dask-v0)
- distributed
- any extra casa stuff
coffea-vine (FROM coffea-base)
- taskvine
- whatever else

and potentially other AF variants. @btovar @mapsacosta @mcremone @rpsimeon34 may have opinions as well. I don't see a strong case for having multiple base OS images. Maybe I am missing something but since our ~whole stack is coming from conda the base OS is somewhat irrelevant as far as I can tell?

nsmith- commented 10 months ago

This does leave up in the air where the ML packages go in the stack. Part of me thinks it should be AF-dependent, so that AFs that support GPUs can ensure they have the correct CUDA versions.

nsmith- commented 10 months ago

This may also be of interest to @kondratyevd

kondratyevd commented 10 months ago

Thanks @nsmith- - at Purdue AF we provide only a single Docker image for users, and manage environments using conda, so probably at the moment this is not super relevant for us, unless I'm missing some important caveats

nsmith- commented 10 months ago

at Purdue AF we provide only a single Docker image for users

can you describe/link that image? That implies users' full conda environments must be shipped to workers?

kondratyevd commented 10 months ago

The image is here: https://github.com/PurdueAF/purdue-af/blob/master/jupyterhub/docker/cmsaf-alma8/Dockerfile

To use conda envs with Dask, users must place their environments into a shared storage volume (Purdue's "Depot" storage is standard work area for local Purdue users, it is mounted to all worker nodes; and we are working on a similar solution for external users). Then, the path to conda env must be specified in Dask Gateway setup.

For local Python imports it's similar - user's files should be in shared storage and PYTHONPATH can be modified in Dask Gateway setup to point to the user's analysis framework.

nsmith- commented 10 months ago

Thanks for the input @kondratyevd, it is also in line with some discussion I had with Burt about the fact that the user server (notebook/lab) image often needs extensive AF customization and essentially they are only interested in conda install coffea or using a common environment.yaml file for optional extras (most notably, xrootd). This makes me wonder if we are better off supplying just a set of conda environment files? For workers an image may be useful because it can be cached more effectively.

kondratyevd commented 10 months ago

Doesn’t this (set of conda files) already mostly provided with just a few optional dependencies? I think our users didn’t have any issues with just conda install coffea==0.7.21 and similarly for v2023/2024. Xrootd is the only exception that I remember.

oshadura commented 9 months ago

@nsmith- @lgray Now only one combination is failing: python 3.11 + coffea 0.7.22 with:

2024-02-15 16:26:20,860 - distributed.worker - WARNING - Compute Failed
Key:       MyProcessor-31e87f80bfa2fb4d21e30f667186a569
Function:  MyProcessor
args:      ((WorkItem(dataset='dimuon', filename='https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dimuon.root', treename='Events', entrystart=0, entrystop=40, fileuuid=b'\xa2\x10\xa3\xf86H\x11\xea\xa2\x9f\xf5\xb5\\\x90\xbe\xef', usermeta={}), b'\x04"M\x18h@-\x00\x00\x00\x00\x00\x00\x00v,\x00\x00\x00R\x80\x05\x95"\x00\x01\x00\xf0\x13\x8c\x0btest_dimuon\x94\x8c\x0bMyProcessor\x94\x93\x94)\x81\x94.\x00\x00\x00\x00'))
kwargs:    {}
Exception: 'TypeError("__class__ assignment: \'NanoEventsArray\' object layout differs from \'Array\'")'

https://github.com/CoffeaTeam/af-images/actions/runs/7918852148/job/21618260586

oshadura commented 9 months ago

Otherwise, I can disable it for now, and then if everything else is ok, we can finally push these images.

oshadura commented 9 months ago

I am going to fix the failing image in the next pull request as well as add other combinations. The goal is to test the first batch of images on coffee-casa already today.

CoffeaTeam / af-images

Add coffea 2024 and coffea 0.7.x images #1