containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.75k stars 2.42k forks source link

Support partial AMD GPU selection on SLURM #21468

Closed jsevillaamd closed 7 months ago

jsevillaamd commented 9 months ago

Feature request description

Mount in container the GPUs selected on the Slurm job. At this moment Podman mounts every GPU in the worker node, ignoring SLURM resources isolation.

This feature is working in singularity and docker:

Suggest potential solution

Discussed here https://github.com/containers/podman/issues/21454

Have you considered any alternatives?

A clear and concise description of any alternative solutions or features you've considered.

Additional context

No response

giuseppe commented 9 months ago

@jsevillaamd how is this different than the previous issue (https://github.com/containers/podman/issues/21454)?

From what I could see, the difference is that rootless Podman cannot use the devices cgroup, as any other device inside the container was the same. In fact, it works with rootful Podman.

jsevillaamd commented 9 months ago

hi @giuseppe 21454 was a bug, asking to isolate devices in podman rootless.

Here I drop a feature suggestion to support AMD GPU SLURM isolation. Rootful Podman is not the solution, because HPC systems do not grant sudo privileges to their users.

Podman rootless would be a great tool alternative to docker or singularity for HPC if it were able to isolate these GPU devices.

giuseppe commented 9 months ago

I understand.

Unfortunately there is nothing we can do on our side, the kernel doesn't allow to use eBPF (for cgroup v2), and cgroups in general are not safe to delegate to rootless users on cgroup v1.

github-actions[bot] commented 8 months ago

A friendly reminder that this issue had no activity for 30 days.

Luap99 commented 7 months ago

Given what we can do is limited by the kernel we cannot do this.