Closed collabnix closed 3 years ago
This is of increased importance now that the (now) legacy 'nvidia runtime' appears broken with Docker 19.03.0 and nvidia-container-toolkit-1.0.0-2
: https://github.com/NVIDIA/nvidia-docker/issues/1017
$ cat docker-compose.yml
version: '2.3'
services:
nvidia-smi-test:
runtime: nvidia
image: nvidia/cuda:9.2-runtime-centos7
$ docker-compose run nvidia-smi-test
Cannot create container for service nvidia-smi-test: Unknown runtime specified nvidia
This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
Any work happening on this?
I got the new Docker CE 19.03.0 on a new Ubuntu 18.04 LTS machine, have the current and matching NVIDIA Container Toolkit (née nvidia-docker2) version, but cannot use it because docker-compose.yml 3.7 doesn't support the --gpus
flag.
Is there a workaround for this?
This works:
docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
This does not:
docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
You need to have
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
in your /etc/docker/daemon.json
for --runtime=nvidia
to continue working. More info here.
ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?
It is an urgent need. Thank you for your effort!
Is it intended to have user manually populate /etc/docker/daemon.json
after migrating to docker >= 19.03
and removing nvidia-docker2
to use nvidia-container-toolkit
instead?
It seems that this breaks a lot of installations. Especially, since --gpus
is not available in compose
.
No, this is a work around for until compose does support the gpus flag.
install nvidia-docker-runtime: https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup add to /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
docker-compose: runtime: nvidia environment:
There is no such thing like "/usr/bin/nvidia-container-runtime" anymore. Issue is still critical.
it will help run nvidia environment with docker-compose, untill fix docker-compose
install nvidia-docker-runtime: https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup add to /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
docker-compose: runtime: nvidia environment:
- NVIDIA_VISIBLE_DEVICES=all
This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime'
when trying to run docker-compose up
any ideas?
This is not working for me, still getting the
Unsupported config option for services.myservice: 'runtime'
when trying to rundocker-compose up
any ideas?
after modify /etc/docker/daemon.json, restart docker service systemctl restart docker use Compose format 2.3 and add runtime: nvidia to your GPU service. Docker Compose must be version 1.19.0 or higher. docker-compose file: version: '2.3'
services: nvsmi: image: ubuntu:16.04 runtime: nvidia environment:
@cheperuiz, you can set nvidia as default runtime in daemon.json and will not be dependent on docker-compose. But all you docker containers will use nvidia runtime - I have no issues so far.
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, }
Ah! thank you @Kwull , i missed that default-runtime
part... Everything working now :)
@uderik, runtime
is no longer present in the current 3.7 compose file format schema, nor in the pending 3.8 version that should eventually align with Docker 19.03: https://github.com/docker/compose/blob/5e587d574a94e011b029c2fb491fb0f4bdeef71c/compose/config/config_schema_v3.8.json
@johncolby runtime
has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).
Yeah, I know, and even though my docker-compose.yml file includes the version: '2.3'
(which have worked in the past) it seems to be ignored by the latest versions...
For future projects, what would be the correct way to enable/disable access to the GPU? just making it default + env variables? or will there be support for the --gpus
flag?
@johncolby what is the replacement for runtime
in 3.X?
@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources
key, something like:
services:
my_app:
deploy:
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: 'gpu'
value: 2
(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74) Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md
Here is the compose issue regarding compose 3.8 schema support, which is already merged in: https://github.com/docker/compose/issues/6530
On the daemon side the gpu
capability can get registered by including it in the daemon.json
or dockerd
CLI (like the previous hard-coded runtime workaround), something like
/usr/bin/dockerd --node-generic-resource gpu=2
which then gets registered by hooking into the NVIDIA docker utility: https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go
It looks like the machinery is basically in place, probably just needs to get documented...
Any update?
Also waiting on updates, using bash
with docker run --gpus
until the official fix...
Waiting for updates asw ell.
Also waiting for updates :)
Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.
[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@
"external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
"extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+ "gpus": {"type": ["number", "string"]},
"healthcheck": {"$ref": "#/definitions/healthcheck"},
"hostname": {"type": "string"},
"image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
'dns_opt',
'env_file',
'extra_hosts',
+ 'gpus',
'group_add',
'init',
'ipc',
@@ -996,6 +997,7 @@ class Service(object):
dns_opt=options.get('dns_opt'),
dns_search=options.get('dns_search'),
restart_policy=options.get('restart'),
+ gpus=options.get('gpus'),
runtime=options.get('runtime'),
cap_add=options.get('cap_add'),
cap_drop=options.get('cap_drop'),
I just added an internal issue to track that. Remember that PRs are welcome :smiley:
Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.
[ruckc@omnilap compose]$ git diff diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json index cd7882f5..d25d404c 100644 --- a/compose/config/config_schema_v3.7.json +++ b/compose/config/config_schema_v3.7.json @@ -151,6 +151,7 @@ "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true}, "extra_hosts": {"$ref": "#/definitions/list_or_dict"}, + "gpus": {"type": ["number", "string"]}, "healthcheck": {"$ref": "#/definitions/healthcheck"}, "hostname": {"type": "string"}, "image": {"type": "string"}, diff --git a/compose/service.py b/compose/service.py index 55d2e9cd..71188b67 100644 --- a/compose/service.py +++ b/compose/service.py @@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [ 'dns_opt', 'env_file', 'extra_hosts', + 'gpus', 'group_add', 'init', 'ipc', @@ -996,6 +997,7 @@ class Service(object): dns_opt=options.get('dns_opt'), dns_search=options.get('dns_search'), restart_policy=options.get('restart'), + gpus=options.get('gpus'), runtime=options.get('runtime'), cap_add=options.get('cap_add'), cap_drop=options.get('cap_drop'),
i tried your solution but I get a lot of errors about that flag:
ERROR: for <SERVICE_NAME> __init__() got an unexpected keyword argument 'gpus'
Traceback (most recent call last):
File "/usr/local/bin/docker-compose", line 11, in <module>
load_entry_point('docker-compose==1.25.0.dev0', 'console_scripts', 'docker-compose')()
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 71, in main
command()
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 127, in perform_command
handler(command, command_options)
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1106, in up
to_attach = up(False)
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1102, in up
cli=native_builder,
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 569, in up
get_deps,
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
raise error_to_reraise
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
result = func(obj)
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 555, in do
renew_anonymous_volumes=renew_anonymous_volumes,
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 546, in execute_convergence_plan
scale, detached, start
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 468, in _execute_convergence_create
"Creating"
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
raise error_to_reraise
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
result = func(obj)
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 466, in <lambda>
lambda service_name: create_and_start(self, service_name.number),
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 454, in create_and_start
container = service.create_container(number=n, quiet=True)
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 337, in create_container
previous_container=previous_container,
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 913, in _get_container_create_options
one_off=one_off)
File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 1045, in _get_container_host_config
cpu_rt_runtime=options.get('cpu_rt_runtime'),
File "/usr/local/lib/python3.6/dist-packages/docker-4.0.2-py3.6.egg/docker/api/container.py", line 590, in create_host_config
return HostConfig(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'
Do I need a specific python docker
package ?
@DarioTurchi Yeah, I met the exact issue. Seems the type of HostConfig needs to be updated also.
I don't believe the change described by @ruckc is sufficient, because docker-py will also need a change. And it looks like the necessary docker-py change is still being worked on. See here: https://github.com/docker/docker-py/pull/2419
Here is the branch with the changes: https://github.com/sigurdkb/docker-py/tree/gpus_parameter
So if you wish to patch this in now you'll have to build docker-compose against a modified docker-py from https://github.com/sigurdkb/docker-py/tree/gpus_parameter
I don't get what is going on here:
1) I have in /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
but runtime
key cannot be used anymore in v3.x as for https://github.com/docker/compose/issues/6239
I have tried also:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
So I cannot start my containers with gpu support on docker-compose
anymore:
bertserving_1 | I:VENTILATOR:[__i:_ge:222]:get devices
bertserving_1 | W:VENTILATOR:[__i:_ge:246]:no GPU available, fall back to CPU
Before those changes it worked, so what can I do now?
+1 it will be very useful to have such feature in docker-compose!
Any eta?
internally tracked as https://docker.atlassian.net/browse/COMPOSE-82
+1 would be useful feature for docker-compose
This feature would be an awesome addition to docker-compose
Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker). Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime). With this I can use a compose file as such:
version: '2.3'
services:
test:
image: nvidia/cuda:9.0-base
runtime: nvidia
Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker). Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime). With this I can use a compose file as such:
version: '2.3' services: test: image: nvidia/cuda:9.0-base runtime: nvidia
@arruda Would you mind sharing your daemon.json
please?
Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker). Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime). With this I can use a compose file as such:
version: '2.3' services: test: image: nvidia/cuda:9.0-base runtime: nvidia
@arruda Would you mind sharing your
daemon.json
please?
Yeah, no problem, here it is:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Hi
I have an application which requires NVIDIA drivers. I have built a docker image based on (FROM) nvidia/cudagl:10.1-runtime-ubuntu18.04
Using the approach recommended above - does it mean my image does not need to be derived from nvidia/cudagl:10.1-runtime-ubuntu18.04 ? I.e. I could simply derive from (FROM) python:3.7.3-stretch
and add runtime: nvidia to the service in docker-compose ?
Thanks
@rfsch No, that's a different thing. runtime: nvidia
in docker-compose refers to the Docker runtime. This makes the GPU available to the container. But you still need some way to use them once they're made available. runtime
in nvidia/cudagl:10.1-runtime-ubuntu18.04
refers to the CUDA runtime components. This lets you use the GPUs (made available in a container by Docker) using CUDA.
In this image:
runtime: nvidia
replaces the runc/containerd part. nvidia/cudagl:10.1-runtime-ubuntu18.04
is completely outside the picture.
we need this feature
@Daniel451 I've just been following along peripherally, but it looks like it will be under the
generic_resources
key, something like:services: my_app: deploy: resources: reservations: generic_resources: - discrete_resource_spec: kind: 'gpu' value: 2
(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74) Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md
Here is the compose issue regarding compose 3.8 schema support, which is already merged in: #6530
On the daemon side the
gpu
capability can get registered by including it in thedaemon.json
ordockerd
CLI (like the previous hard-coded runtime workaround), something like/usr/bin/dockerd --node-generic-resource gpu=2
which then gets registered by hooking into the NVIDIA docker utility: https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go
It looks like the machinery is basically in place, probably just needs to get documented...
Hey, @johncolby, I tried this, but failed:
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed ('generic_resources' was unexpected)
any suggestions?
Thanks David
Installing nvidia-container-runtime 3.1.4.1
from https://github.com/NVIDIA/nvidia-container-runtime and putting
runtime: nvidia
works fine here with docker-compose 1.23.1
and 1.24.1
as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.
Internally tracked as COMPOSE-82
Please note that such a change need also to be implemented in docker stack
(https://github.com/docker/cli/blob/master/cli/compose/types/types.go#L156) for consistency
Installing
nvidia-container-runtime 3.1.4.1
from https://github.com/NVIDIA/nvidia-container-runtime and puttingruntime: nvidia
works fine here with
docker-compose 1.23.1
and1.24.1
as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.
can you share your docker-compose.yml ?
hey, @jdr-face,
here is my test following your suggestion, by install nvidia-container-runtime
at host machine.
version: '3.0'
services:
nvidia-smi-test:
runtime: nvidia
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
environment:
- NVIDIA_VISIBLE_DEVICES=0
- DISPLAY
image: vkcube
it still give the error:
Unsupported config option for services.nvidia-smi-test: 'runtime'
@david-gwa as noted by andyneff earlier:
runtime
has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).
@david-gwa
can you share your docker-compose.yml ?
version: '2.3'
services:
container:
image: "nvidia/cudagl/10.1-base"
runtime: "nvidia"
security_opt:
- seccomp:unconfined
privileged: true
volumes:
- $HOME/.Xauthority:/root/.Xauthority:rw
- /tmp/.X11-unix:/tmp/.X11-unix:rw
environment:
- NVIDIA_VISIBLE_DEVICES=all
Depending on your needs some of those options may be unnecessary. As @muru predicted, the trick is to specify an old version. At least for my use case this isn't a problem, but I only offer this config as a workaround, really it should be made possible using the latest version.
thanks guys, @jdr-face , @muru , compose v2 does work, I mis-understood your solution is for v3 compose.
For the record, traditionally speaking: compose v2 is not older than compose v3. They are different use cases. v3 is geared towards swarm while v2 is not. v1 is old.
Is there any discussion about the support of Docker-compose for Docker's native GPU support?
Supporting runtime
option is not the solution for GPU support in the future. NVIDIA describes about the future of nvidia-docker2 in https://github.com/NVIDIA/nvidia-docker as follows.
Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime.
Currently, GPU support can be realized by changing the runtime, but it is highly possible that this will not work in the future.
Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. https://github.com/docker/cli/pull/1714 talk about this enablement.
Now one can simply pass --gpus option for GPU-accelerated Docker based application.
As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.