Allow multiple values for swarm resources in nvidia-container-runtime-hook

mazoruss commented 1 year ago

Currently, allocating GPUs with docker swarm using docker services generic-resources involves modifying the /etc/nvidia-container-runtime/config.toml and adding swarm-resources = DOCKER_RESOURCE_GPU into the file. However, this assumes that there is only one type of generic resource for GPUs. In our use case, we would like to be able to allocate various different types of GPUs to different docker services, which is not possible at the moment.

For example, let’s assume there are two swarm nodes, node-A with an RTX-3080 installed and node-B with an Nvidia-T4 installed, and in the node-generic-resources on each node, they are labeled with gpu_rtx3080={uuid1} and gpu_t4={uuid2}. Now, if I were to run: $ docker service create ---generic-resource "gpu_t4=1" --name service-1 $ docker service create --generic-resource "gpu_rtx3080=1" --name service-2 I’d expect service-1 to be allocated to node-B, and service-2 to be allocated to node-A because that’s where they would have their resource demands filled. So then service-1 would be created with DOCKER_RESOUCE_GPU_T4={uuid2} in the environment variables, and service-2 would be created with DOCKER_RESOUCE_GPU_RTX3080={uuid1} in the environment variables.

However, there’s no way to define both DOCKER_RESOUCE_GPU_T4 and DOCKER_RESOUCE_GPU_RTX3080 as swarm-resource, because the config.toml file can only accept one value. So even though the docker swarm is passing in the correct environment variables, the runtime hook cannot pick up both and pass them along to NvidiaConfigs.

The proposal is to have swarm-resources accept comma delimited strings as lists. For example: swarm-resources = DOCKER_RESOURCE_GPU_RTX3080,DOCKER_RESOURCE_GPU_T4 And then the hook would pass in the correct GPU UUIDs into NvidiaConfigs based on the presence of the environment variables at runtime. This should likely be backward compatible with the existing code and would allow the flexibility for docker swarm to allocate multiple types of Nvidia GPUs intelligently.

elezar commented 1 year ago

@mazoruss as a matter of interest, would CDI be an option in the long run? Although it may be feasible to add support for multiple environment variables as you suggest. Adding CDI support to docker swarm should be a longer term goal as this will ensure that future enhancements made to the NVIDIA Container Toolkit will be automatically supported.

Do you have insight into the swarm code base and whether adding CDI support is something that would be of interest?

mazoruss commented 1 year ago

Hi @elezar that seems like it could be a good long-term solution. However, we're looking to get a working solution very quickly and I'm not very familiar with the swarm code base.

elezar commented 1 year ago

I will check what the impact of supporting multiple environment variables is.

elezar commented 1 year ago

@mazoruss I have created https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/220 which adds this feature as requested. I don't have ready access to Docker Swarm systems (especially with multiple GPUs) so would you be able to test the changes.

These are low risk enough to be included in an upcoming release candidate for v1.12.0 of the NVIDIA Container Toolkit.

mazoruss commented 1 year ago

@elezar Thank you for the quick turnaround! I'll be testing the diff to make sure that it works for our use case and let you know asap.

mazoruss commented 1 year ago

Hi @elezar In line 123 in internal/config/image/cuda_image.go, DevicesFromEnvvars will break the for loop after finding any available env var, even if you add multiple. So the patch still would not allow two types of GPUs to be added simultaneously. We created a patch if you'd like to take a look for reference, we've tested it many times.

Edit: file uploaded in comment below.

elezar commented 1 year ago

Hi @elezar In line 123 in internal/config/image/cuda_image.go, DevicesFromEnvvars will break the for loop after finding any available env var, even if you add multiple. So the patch still would not allow two types of GPUs to be added simultaneously. We created a patch if you'd like to take a look for reference, we've tested it many times.

swarm_vars.zip

Ah, sorry for the misunderstanding. I thought that this was an either-or behaviour. Let me have a look at your diff. (If you want to create an MR against https://gitlab.com/nvidia/container-toolkit/container-toolkit that would also be helpful.

elezar commented 1 year ago

@mazoruss, I have updated the MR to allow for multiple devices to be injected using multiple swarm variables. Your proposal breaks some of the cases required for backward compatibility and as such I had to rework things a little. Please test the new revision.

An MR with the required functionality is available: https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/222

mazoruss commented 1 year ago

@elezar Sorry, I uploaded an outdated version of the patch we're using in the previous comment. Here is the correct one. swarm_multi_vars.zip Please take a look. The patch should be cleanly appliable to v1.11.0 using git apply <file>. If it makes sense to you we can create a MR with the full patch, or if suitable we can have just the test cases from the patch.

Edit: it doesn't seem like I can make MRs on GitLab.

elezar commented 1 year ago

Looking at your diff, I think https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/222 covers this functionality but also treats the swarm-related envvars as a separate set. Could you confirm that this does what you would like it to do?

elezar commented 1 year ago

@mazoruss we have released v1.12.0-rc.2 with support for what you requested to our experimental repositories. Please could you confirm that this addresses your issue.

NVIDIA / nvidia-container-toolkit

Allow multiple values for swarm resources in nvidia-container-runtime-hook #36