arkitektio / arkitekt

arkitekt is the python api client for the arkitekt-framework
https://arkitekt.live
MIT License
1 stars 1 forks source link

Support for Flavors #14

Open alexschroeter opened 4 months ago

alexschroeter commented 4 months ago

To have an easy test case for the support of flavors I have created the test repository (https://github.com/alexschroeter/tensorflow-test).

These are the issues I have encountered:

I cannot pass parameters to the docker run command

Since the different vendors want different parameters passed to docker, we need the ability to add these parameters to the docker run command.

CUDA: docker run --gpus all <image>
ROCM: docker run --device=/dev/kfd --device=/dev/dri --group-add video <image>
DPC++: docker run --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path <image>

I believe the config.yaml will be the right location for these docker_params as I have called them in the example repository. I went looking through the code a bit and I was wondering if the selectors (https://github.com/arkitektio/arkitekt/blob/e8762476e08ec850d3e27d2a41b0c8c63863bbb3/arkitekt/cli/types.py#L343C30-L343C39) are meant to be for that or if this part is not there yet.

I am also wondering if you already passed --gpus all for Nvidia somewhere or if this is not strictly necessary anymore and this is the reason I could not find it.

Select GPUs

Related to the topic above I am wondering how to handle the selection of specific GPUs, and if you already had some thoughts regarding this. I believe for Nvidia and AMD you can pass [CUDA|HIP]_VISIBLE_DEVICES=[list of numbers] or something similar but I don't know yet where you would add this (Instantiation of the App) and how.

Because I cannot pass parameters, "arkitekt port build" fails since it cannot find the GPUs.

I believe this problem will be solved by adding the parameters to config.yaml but I am wondering if build should fail if the container cannot be started or if the build should work as long as the container can be built. As a side note, this issue only happened with the Intel container the other ones didn't fail if they couldn't find the GPU where the Intel one did.

arkitekt port publish takes the last flavor build

I don't know if there are other reasons but for consistency, I believe publish should publish all of them, and -f flavor can be used to publish a certain flavor only. Right now, I have to work around it by building the one that I want to publish again and publish after, which takes the last one it seems.

Deploy in Port

I am unable to deploy in Port but I am wondering if this is because some containers (the Intel one) are missing or because I am using a version that doesn't support this yet.

Cheers, Alex

alexschroeter commented 4 months ago

Regarding the config.yaml I was wondering if my docker-params should be part of a docker-compose file instead. This would make it easier to understand because it's documented familiarly and we could use all docker-compose features.

One of the things that we would have to figure out regarding different container technologies is the "translation" of these parameters. Passing the right drivers and devices looks different depending on the combination of device and container.

We could probably make these part of the app's features. So only if you have defined the parameters for singularity the app supports running with singularity. I like a more automatic approach but I couldn't find a translation between docker-compose and singularity-compose (which I have never tried before but could be the alternative configuration file). Other parameters such as port forwards for access to jupyter-notebooks would probably handle a bit easier but I am not sure if there aren't problems with automatic translation of the parameters as well.

jhnnsrs commented 4 months ago

Cool Stuff!

I actually do feel that the selectors would be the place for having this sort of configuration, but utilizing an interface like approach where the selectors would be an abstract mapping that specific backends could implemement their run calls in. Something like that:

Untitled-2024-03-04-1010

This would allow us to use the same selectors for different backends and also would allow us to have the smae codebase for the arkitekt cli (using the helper library to call a configured backend (by default docker, but maybe also podman), and for port or the conceptualize cluster app? What do you think?

alexschroeter commented 4 months ago

I think this would also work and I like it more in the sense that it would allow us to run multiple backends without having them depend on the configuration.

I am wondering how complicated it will get with all the different options but I guess this can be added one after another. I would like to see port forwarding as well so we can do something like the following.

One of the "Apps" that I would like to build is an "explore in jupyter" which would run a jupyter-notebook, so you can use it to explore "intermediate" results. I think this would be a good counterbalance to having it fully automatized but still be interactive.