create_container not support --gpus param

ffteen commented 5 years ago

docker version: 19.03 I want to set --gpus all when create container ,but found docker-py not support this param.

jcsirot commented 5 years ago

Hello @ffteen thank you for the report

msadri70 commented 5 years ago

any progress on this issue?

Bluemi commented 5 years ago

I think one hacky way, though not very reliable, is to use the low-level Api and overwrite the host configuration. Since I only tried to follow the docker cli code in go, I'm not sure how reliable/portable this solution is. It works on my machine and I thought it might help someone until the official support is implemented.

The following code is a modification of the original DockerClient.containers.create() function, that adds a DeviceRequest to the host configuration and otherwise works exactly like the original function:

import docker
from docker.models.images import Image
from docker.models.containers import _create_container_args

def create_with_device_request(client, image, command, device_request=None, **kwargs):
    if isinstance(image, Image):
        image = image.id
    kwargs['image'] = image
    kwargs['command'] = command
    kwargs['version'] = client.containers.client.api._version
    create_kwargs = _create_container_args(kwargs)

    # modification to the original create function
    if device_request is not None:
        create_kwargs['host_config']['DeviceRequests'] = [device_request]
    # end modification

    resp = client.api.create_container(**create_kwargs)
    return client.containers.get(resp['Id'])

# Example usage
device_request = {
    'Driver': 'nvidia',
    'Capabilities': [['gpu'], ['nvidia'], ['compute'], ['compat32'], ['graphics'], ['utility'], ['video'], ['display']],  # not sure which capabilities are really needed
    'Count': -1,  # enable all gpus
}

container = create_with_device_request(docker.from_env(), 'nvidia/cuda:9.0-base', 'nvidia-smi', device_request, ...)

I think the cli client sets the NVIDIA_VISIBLE_DEVICES environment variable, so it's probably a good idea to do the same with environment={'NVIDIA_VISIBLE_DEVICES': 'all'} as parameter of the create_with_device_request() call. This enables all available gpus. You could modify this with different device_requests:

# enable two gpus
device_request = {
    'Driver': 'nvidia',
    'Capabilities': ...,
    'Count': 2,  # enable two gpus
}

# enable gpus with id or uuid
device_request = {
    'Driver': 'nvidia',
    'Capabilities': ...,
    'DeviceIDs': ['0', 'GPU-abcedfgh-1234-a1b2-3c4d-a7f3ovs13da1']  # enable gpus with id 0 and uuid
}

The environment parameter should then look like {'NVIDIA_VISIBLE_DEVICES': '0,1'} respectively {'NVIDIA_VISIBLE_DEVICES': '0,GPU-xxx'}

ffteen commented 5 years ago

I‘m not sure which capabilities are really needed too！

Does create_service support device request param？

I use nvidia runtime instead.

Bluemi commented 5 years ago

As far as I can tell, services.create() does not support device requests.

Setting runtime='nvidia' is definitely the better approach, if possible. The problem I had was, that I use the nvidia-container-toolkit which does not require to install the nvidia-runtime, so setting nvidia runtime leads to Error: unknown runtime specified nvidia, while using --gpus=all works as expected.

Is there a better way to use nvidia-gpus with the nvidia-container-toolkit?

hnine999 commented 5 years ago

I have a change (that appears to work) that allows the "gpus" option in my fork. I'd like to create a PR for it, but when running the tests, this error (which is unrelated to the change) occurs:

tests/integration/api_service_test.py:379:53: F821 undefined name 'BUSYBOX' Makefile:92: recipe for target 'flake8' failed

Is there a package that needs to be installed to fix this?

shin- commented 5 years ago

@hnine999 No, that's an error on our end - we'll fix it shortly. Feel free to submit your PR in the meantime!

jamesdbrock commented 4 years ago

The PR from @hnine999 is #2419

rAm1n commented 4 years ago

Hi - Any update with this feature?

AustinDeric commented 4 years ago

Any update on this? It is badly needed. docker-py is functionally broken for running GPU enabled containers.

Dmitry1987 commented 4 years ago

+1

Dmitry1987 commented 4 years ago

this is actually a major feature for all data science community that runs tensorflow in docker on nvidia GPUs in the cloud. Why is this ignored for such a long time? 😞

bluebox42 commented 4 years ago

Any update on this?

Dmitry1987 commented 4 years ago

Still waiting for this to be supported... The only workaround for now is "docker run" with bash :(

On Thu, Mar 12, 2020, 02:11 bluebox42 notifications@github.com wrote:

Any update on this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/docker/docker-py/issues/2395#issuecomment-598081853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHPL6ATC2GVI56J6F4IQ5DRHCRNRANCNFSM4IIP5BUA .

jmsmkn commented 4 years ago

Still waiting for this to be supported... The only workaround for now is "docker run" with bash :(

At the moment, nvidia-container-toolkit still includes nvidia-container-runtime. So, you can still add nvidia-container-runtime as a runtime in /etc/docker/daemon.json:

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Then restart the docker service (sudo systemctl restart docker) and use runtime="nvidia" in docker-py as before.

chorus12 commented 4 years ago

Thanks a bunch - that works BUT the daemon.json is missing a double quote in runtimes: { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }

Is there a solid fix for this issue?

jmsmkn commented 4 years ago

Thanks - updated my comment with that suggestion

vwxyzjn commented 4 years ago

Hi @jmsmkn I installed nvidia-container-toolkit in arch, but it does not come with nvidia-container-runtime. Any update with this? Thanks.

cd /usr/bin
ls | grep nvidia
nvidia-bug-report.sh
nvidia-container-cli
nvidia-container-runtime-hook
nvidia-container-toolkit
nvidia-cuda-mps-control
nvidia-cuda-mps-server
nvidia-debugdump
nvidia-modprobe
nvidia-persistenced
nvidia-settings
nvidia-sleep.sh
nvidia-smi
nvidia-xconfig

DrizzlingCattus commented 4 years ago

@vwxyzjn arch

I think this will help

MikeWhittakerRyff commented 4 years ago

Simple "gpus=" keyword parameter, please !

milk4candy commented 4 years ago

Need this feature supported badly for lots people who are dealing data with GPU for AI and HPC. Please add this feature as soon as you guys can, we'll be very grateful.

christian-steinmeyer commented 3 years ago

Is this issue on some agenda? (This is your second most upvoted open issue at the moment.)

gabrieldemarmiesse commented 3 years ago

Hi all, I made a Python client for Docker that sits on top of the Docker client binary (the one written in go). It took me several months of work. It notably has support for gpus in docker.run(...) and docker.container.create(...), with all options that the CLI has.

It's currently only available for my sponsors, but It'll be open source with an MIT licence May 1st, 2021 🙂

https://gabrieldemarmiesse.github.io/python-on-whales/

gabrieldemarmiesse commented 3 years ago

Hi all, in the end, making Python-on-whales pay-to-use wasn't a success. So I've open-sourced it.

It's free and on Pypi now. Have fun 😃

$ pip install python-on-whales
$ python
>>> from python_on_whales import docker
>>> print(docker.run("nvidia/cuda:11.0-base", ["nvidia-smi"], gpus="all"))
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

https://github.com/gabrieldemarmiesse/python-on-whales

Dmitry1987 commented 3 years ago

looks good!

On Wed, Dec 2, 2020 at 2:30 PM Gabriel de Marmiesse < notifications@github.com> wrote:

Hi all, in the end, making Python-on-whales pay-to-use wasn't a success. So I've open-sourced it.

It's free and on Pypi now. Have fun 😃

$ pip install python-on-whales

$ python

from python_on_whales import docker print(docker.run("nvidia/cuda:11.0-base", ["nvidia-smi"], gpus="all")) +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. |

|===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage |

|=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

https://github.com/gabrieldemarmiesse/python-on-whales

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/docker/docker-py/issues/2395#issuecomment-737446886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHPL6CZROSUA3THJNPFA3TSS2IVHANCNFSM4IIP5BUA .

MikeWhittakerRyff commented 3 years ago

In the end, I have just written a very simple wrapper around subprocess.run, with a built arg_list that can include the required GPU parameter, that captures stdout and stderr and the return code, and the execution duration.

Incidentally I have found that the AWS ML AMI works well with Docker/nVidia, with no further tricky configuration required. All I would say is to fire up an instance using the AMI, do the required apt update/upgrades, then freeze /that/ as your AMI to use; it avoids a 5-minute delay ! For my purposes, a root volume of 200GB works fine, as opposed to the vast default root volumes you get with the g3/g4 instances (maybe required if you are going to hibernate). But am going a bit off-topic !

JoanFM commented 3 years ago

Hello team, is this a feature that you are thinking of adding? It would be of great value

matyushinleonid commented 3 years ago

@JoanFM I guess, this functionality has already been implemented:

client.containers.run(
    'nvidia/cuda:9.0-base',
    'nvidia-smi',
    device_requests=[
        docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
    ]
)

Not very elegant, but it works

Data-drone commented 3 years ago

@matyushinleonid Thanks heaps! it worked

drvpn commented 1 year ago

@JoanFM I guess, this functionality has already been implemented:
client.containers.run(
    'nvidia/cuda:9.0-base',
    'nvidia-smi',
    device_requests=[
        docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
    ]
)
Not very elegant, but it works

This works! This is the only solution that actually works, thanks so much! :)

htjain commented 11 months ago

no luck with device_requests.

import docker
import os
os.environ['DOCKER_HOST'] = "unix:///run/user/1000/podman/podman.sock"
#os.environ['DOCKER_HOST'] = "unix:///run/podman/podman.sock"
client = docker.from_env()
logs = client.containers.run('nvidia/cuda:12.2.0-devel-ubuntu20.04',
                              "nvidia-smi",
                               device_requests=[docker.types.DeviceRequest(count=-1,capabilities=[['gpu']])])

Traceback (most recent call last):
  File "/home/user/podman_gpu.py", line 7, in <module>
    logs = client.containers.run('nvidia/cuda:12.2.0-devel-ubuntu20.04',
  File "/usr/local/lib/python3.9/site-packages/docker/models/containers.py", line 887, in run
    raise ContainerError(
docker.errors.ContainerError: Command 'nvidia-smi' in image 'nvidia/cuda:12.2.0-devel-ubuntu20.04' returned non-zero exit status 127: b'/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found\n'

podman version: 4.4.1 host OS: RHEL 9.2 docker py version: 6.1.3

podman CLI able to access GPU

[user@rh91-bay7 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.2.0-devel-ubuntu20.04 nvidia-smi -L

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

GPU 0: Tesla T4 (UUID: GPU-f7c1d1ba-7a85-537a-65ae-462ce7d7eca8)
[user@rh91-bay7 ~]$

midfzz commented 2 months ago

@JoanFM我猜想这个功能已经实现了：
client.containers.run(
    'nvidia/cuda:9.0-base',
    'nvidia-smi',
    device_requests=[
        docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
    ]
)
不是很优雅，但是可以工作
这有效！这是唯一真正有效的解决方案，非常感谢！:)

nice for true

docker / docker-py

create_container not support --gpus param #2395