docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.68k stars 5.19k forks source link

Support for NVIDIA GPUs under Docker Compose #6691

Closed collabnix closed 3 years ago

collabnix commented 5 years ago

Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. https://github.com/docker/cli/pull/1714 talk about this enablement.

Now one can simply pass --gpus option for GPU-accelerated Docker based application.

$ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$ 

As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.

qhaas commented 5 years ago

This is of increased importance now that the (now) legacy 'nvidia runtime' appears broken with Docker 19.03.0 and nvidia-container-toolkit-1.0.0-2: https://github.com/NVIDIA/nvidia-docker/issues/1017

$ cat docker-compose.yml 
version: '2.3'

services:
 nvidia-smi-test:
  runtime: nvidia
  image: nvidia/cuda:9.2-runtime-centos7

$ docker-compose run nvidia-smi-test
Cannot create container for service nvidia-smi-test: Unknown runtime specified nvidia

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

michaelnordmeyer commented 5 years ago

Any work happening on this?

I got the new Docker CE 19.03.0 on a new Ubuntu 18.04 LTS machine, have the current and matching NVIDIA Container Toolkit (née nvidia-docker2) version, but cannot use it because docker-compose.yml 3.7 doesn't support the --gpus flag.

akiross commented 5 years ago

Is there a workaround for this?

kiendang commented 5 years ago

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

You need to have

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

in your /etc/docker/daemon.json for --runtime=nvidia to continue working. More info here.

VanDavv commented 5 years ago

ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?

iedmrc commented 5 years ago

It is an urgent need. Thank you for your effort!

Daniel451 commented 5 years ago

Is it intended to have user manually populate /etc/docker/daemon.json after migrating to docker >= 19.03 and removing nvidia-docker2 to use nvidia-container-toolkit instead?

It seems that this breaks a lot of installations. Especially, since --gpus is not available in compose.

andyneff commented 5 years ago

No, this is a work around for until compose does support the gpus flag.

uderik commented 5 years ago

install nvidia-docker-runtime: https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup add to /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }

docker-compose: runtime: nvidia environment:

Kwull commented 5 years ago

There is no such thing like "/usr/bin/nvidia-container-runtime" anymore. Issue is still critical.

uderik commented 5 years ago

it will help run nvidia environment with docker-compose, untill fix docker-compose

cheperuiz commented 5 years ago

install nvidia-docker-runtime: https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup add to /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }

docker-compose: runtime: nvidia environment:

  • NVIDIA_VISIBLE_DEVICES=all

This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime' when trying to run docker-compose up

any ideas?

uderik commented 5 years ago

This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime' when trying to run docker-compose up

any ideas?

after modify /etc/docker/daemon.json, restart docker service systemctl restart docker use Compose format 2.3 and add runtime: nvidia to your GPU service. Docker Compose must be version 1.19.0 or higher. docker-compose file: version: '2.3'

services: nvsmi: image: ubuntu:16.04 runtime: nvidia environment:

Kwull commented 5 years ago

@cheperuiz, you can set nvidia as default runtime in daemon.json and will not be dependent on docker-compose. But all you docker containers will use nvidia runtime - I have no issues so far. { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, }

cheperuiz commented 5 years ago

Ah! thank you @Kwull , i missed that default-runtime part... Everything working now :)

johncolby commented 5 years ago

@uderik, runtime is no longer present in the current 3.7 compose file format schema, nor in the pending 3.8 version that should eventually align with Docker 19.03: https://github.com/docker/compose/blob/5e587d574a94e011b029c2fb491fb0f4bdeef71c/compose/config/config_schema_v3.8.json

andyneff commented 5 years ago

@johncolby runtime has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).

cheperuiz commented 5 years ago

Yeah, I know, and even though my docker-compose.yml file includes the version: '2.3' (which have worked in the past) it seems to be ignored by the latest versions... For future projects, what would be the correct way to enable/disable access to the GPU? just making it default + env variables? or will there be support for the --gpus flag?

Daniel451 commented 5 years ago

@johncolby what is the replacement for runtime in 3.X?

johncolby commented 5 years ago

@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources key, something like:

services:
  my_app:
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 2

(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74) Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md

Here is the compose issue regarding compose 3.8 schema support, which is already merged in: https://github.com/docker/compose/issues/6530

On the daemon side the gpu capability can get registered by including it in the daemon.json or dockerd CLI (like the previous hard-coded runtime workaround), something like

/usr/bin/dockerd --node-generic-resource gpu=2

which then gets registered by hooking into the NVIDIA docker utility: https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go

It looks like the machinery is basically in place, probably just needs to get documented...

chongyi-zheng commented 5 years ago

Any update?

statikkkkk commented 5 years ago

Also waiting on updates, using bash with docker run --gpus until the official fix...

celbirlik commented 5 years ago

Waiting for updates asw ell.

litanlitudan commented 5 years ago

Also waiting for updates :)

ruckc commented 5 years ago

Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.

[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@

         "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
         "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "gpus": {"type": ["number", "string"]},
         "healthcheck": {"$ref": "#/definitions/healthcheck"},
         "hostname": {"type": "string"},
         "image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
     'dns_opt',
     'env_file',
     'extra_hosts',
+    'gpus',
     'group_add',
     'init',
     'ipc',
@@ -996,6 +997,7 @@ class Service(object):
             dns_opt=options.get('dns_opt'),
             dns_search=options.get('dns_search'),
             restart_policy=options.get('restart'),
+            gpus=options.get('gpus'),
             runtime=options.get('runtime'),
             cap_add=options.get('cap_add'),
             cap_drop=options.get('cap_drop'),
ulyssessouza commented 5 years ago

I just added an internal issue to track that. Remember that PRs are welcome :smiley:

DarioTurchi commented 5 years ago

Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.

[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@

         "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
         "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "gpus": {"type": ["number", "string"]},
         "healthcheck": {"$ref": "#/definitions/healthcheck"},
         "hostname": {"type": "string"},
         "image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
     'dns_opt',
     'env_file',
     'extra_hosts',
+    'gpus',
     'group_add',
     'init',
     'ipc',
@@ -996,6 +997,7 @@ class Service(object):
             dns_opt=options.get('dns_opt'),
             dns_search=options.get('dns_search'),
             restart_policy=options.get('restart'),
+            gpus=options.get('gpus'),
             runtime=options.get('runtime'),
             cap_add=options.get('cap_add'),
             cap_drop=options.get('cap_drop'),

i tried your solution but I get a lot of errors about that flag:

ERROR: for <SERVICE_NAME>  __init__() got an unexpected keyword argument 'gpus'
Traceback (most recent call last):
  File "/usr/local/bin/docker-compose", line 11, in <module>
    load_entry_point('docker-compose==1.25.0.dev0', 'console_scripts', 'docker-compose')()
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 71, in main
    command()
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 127, in perform_command
    handler(command, command_options)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1106, in up
    to_attach = up(False)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1102, in up
    cli=native_builder,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 569, in up
    get_deps,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
    raise error_to_reraise
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
    result = func(obj)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 555, in do
    renew_anonymous_volumes=renew_anonymous_volumes,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 546, in execute_convergence_plan
    scale, detached, start
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 468, in _execute_convergence_create
    "Creating"
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
    raise error_to_reraise
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
    result = func(obj)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 466, in <lambda>
    lambda service_name: create_and_start(self, service_name.number),
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 454, in create_and_start
    container = service.create_container(number=n, quiet=True)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 337, in create_container
    previous_container=previous_container,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 913, in _get_container_create_options
    one_off=one_off)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 1045, in _get_container_host_config
    cpu_rt_runtime=options.get('cpu_rt_runtime'),
  File "/usr/local/lib/python3.6/dist-packages/docker-4.0.2-py3.6.egg/docker/api/container.py", line 590, in create_host_config
    return HostConfig(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'

Do I need a specific python docker package ?

litanlitudan commented 5 years ago

@DarioTurchi Yeah, I met the exact issue. Seems the type of HostConfig needs to be updated also.

AndrewJDR commented 5 years ago

I don't believe the change described by @ruckc is sufficient, because docker-py will also need a change. And it looks like the necessary docker-py change is still being worked on. See here: https://github.com/docker/docker-py/pull/2419

Here is the branch with the changes: https://github.com/sigurdkb/docker-py/tree/gpus_parameter

So if you wish to patch this in now you'll have to build docker-compose against a modified docker-py from https://github.com/sigurdkb/docker-py/tree/gpus_parameter

loretoparisi commented 4 years ago

I don't get what is going on here:

1) I have in /etc/docker/daemon.json

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

but runtime key cannot be used anymore in v3.x as for https://github.com/docker/compose/issues/6239

I have tried also:

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

So I cannot start my containers with gpu support on docker-compose anymore:

bertserving_1    | I:VENTILATOR:[__i:_ge:222]:get devices
bertserving_1    | W:VENTILATOR:[__i:_ge:246]:no GPU available, fall back to CPU

Before those changes it worked, so what can I do now?

sld commented 4 years ago

+1 it will be very useful to have such feature in docker-compose!

ysyyork commented 4 years ago

Any eta?

ndeloof commented 4 years ago

internally tracked as https://docker.atlassian.net/browse/COMPOSE-82

jackproudfoot commented 4 years ago

+1 would be useful feature for docker-compose

cjcbusatto commented 4 years ago

This feature would be an awesome addition to docker-compose

arruda commented 4 years ago

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker). Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime). With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia
cheperuiz commented 4 years ago

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker). Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime). With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

@arruda Would you mind sharing your daemon.json please?

arruda commented 4 years ago

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker). Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime). With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

@arruda Would you mind sharing your daemon.json please?

Yeah, no problem, here it is:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
rfsch commented 4 years ago

Hi

I have an application which requires NVIDIA drivers. I have built a docker image based on (FROM) nvidia/cudagl:10.1-runtime-ubuntu18.04

Using the approach recommended above - does it mean my image does not need to be derived from nvidia/cudagl:10.1-runtime-ubuntu18.04 ? I.e. I could simply derive from (FROM) python:3.7.3-stretch
and add runtime: nvidia to the service in docker-compose ?

Thanks

muru commented 4 years ago

@rfsch No, that's a different thing. runtime: nvidia in docker-compose refers to the Docker runtime. This makes the GPU available to the container. But you still need some way to use them once they're made available. runtime in nvidia/cudagl:10.1-runtime-ubuntu18.04 refers to the CUDA runtime components. This lets you use the GPUs (made available in a container by Docker) using CUDA.

In this image:

Docker architecture

runtime: nvidia replaces the runc/containerd part. nvidia/cudagl:10.1-runtime-ubuntu18.04 is completely outside the picture.

GeorgeFedoseev commented 4 years ago

we need this feature

david-gwa commented 4 years ago

@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources key, something like:

services:
  my_app:
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 2

(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74) Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md

Here is the compose issue regarding compose 3.8 schema support, which is already merged in: #6530

On the daemon side the gpu capability can get registered by including it in the daemon.json or dockerd CLI (like the previous hard-coded runtime workaround), something like

/usr/bin/dockerd --node-generic-resource gpu=2

which then gets registered by hooking into the NVIDIA docker utility: https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go

It looks like the machinery is basically in place, probably just needs to get documented...

Hey, @johncolby, I tried this, but failed:

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed ('generic_resources' was unexpected)

any suggestions?

Thanks David

jdr-face commented 4 years ago

Installing nvidia-container-runtime 3.1.4.1 from https://github.com/NVIDIA/nvidia-container-runtime and putting

runtime: nvidia

works fine here with docker-compose 1.23.1 and 1.24.1 as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:

sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.

ndeloof commented 4 years ago

Internally tracked as COMPOSE-82 Please note that such a change need also to be implemented in docker stack (https://github.com/docker/cli/blob/master/cli/compose/types/types.go#L156) for consistency

david-gwa commented 4 years ago

Installing nvidia-container-runtime 3.1.4.1 from https://github.com/NVIDIA/nvidia-container-runtime and putting

runtime: nvidia

works fine here with docker-compose 1.23.1 and 1.24.1 as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:

sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.

can you share your docker-compose.yml ?

hey, @jdr-face,

here is my test following your suggestion, by install nvidia-container-runtime at host machine.

version: '3.0'

services:
  nvidia-smi-test:
    runtime: nvidia
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix 
    environment:
     - NVIDIA_VISIBLE_DEVICES=0 
     - DISPLAY
    image: vkcube

it still give the error:

       Unsupported config option for services.nvidia-smi-test: 'runtime'
muru commented 4 years ago

@david-gwa as noted by andyneff earlier:

runtime has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).

jdr-face commented 4 years ago

@david-gwa

can you share your docker-compose.yml ?

version: '2.3'

services:
    container:
        image: "nvidia/cudagl/10.1-base"

        runtime: "nvidia" 

        security_opt:
            - seccomp:unconfined
        privileged: true

        volumes:
            - $HOME/.Xauthority:/root/.Xauthority:rw
            - /tmp/.X11-unix:/tmp/.X11-unix:rw

        environment:
          - NVIDIA_VISIBLE_DEVICES=all

Depending on your needs some of those options may be unnecessary. As @muru predicted, the trick is to specify an old version. At least for my use case this isn't a problem, but I only offer this config as a workaround, really it should be made possible using the latest version.

david-gwa commented 4 years ago

thanks guys, @jdr-face , @muru , compose v2 does work, I mis-understood your solution is for v3 compose.

andyneff commented 4 years ago

For the record, traditionally speaking: compose v2 is not older than compose v3. They are different use cases. v3 is geared towards swarm while v2 is not. v1 is old.

pddg commented 4 years ago

Is there any discussion about the support of Docker-compose for Docker's native GPU support?

Supporting runtime option is not the solution for GPU support in the future. NVIDIA describes about the future of nvidia-docker2 in https://github.com/NVIDIA/nvidia-docker as follows.

Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime.

Currently, GPU support can be realized by changing the runtime, but it is highly possible that this will not work in the future.