Closed collabnix closed 3 years ago
To be frank, this maybe not the best practise, but somehow we make it work.
The tricky part is that we have to stick with docker-compose v3.x since we are use docker swarm, meanwhile we want to use the Nvidia Runtime to support GPU/CUDA in the containers.
To avoid explicitly telling the Nvidia Runtime inside the docker-compose file, we set the Nvidia as the default runtime in /etc/docker/daemon.json
, and it will looks like
{
"default-runtime":"nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Such that all the containers running on the GPU machines will default enable the Nvidia runtime.
Hope this can help someone facing the similar blocker
To be frank, this maybe not the best practise, but somehow we make it work.
The tricky part is that we have to stick with docker-compose v3.x since we are use docker swarm, meanwhile we want to use the Nvidia Runtime to support GPU/CUDA in the containers.
To avoid explicitly telling the Nvidia Runtime inside the docker-compose file, we set the Nvidia as the default runtime in
/etc/docker/daemon.json
, and it will looks like{ "default-runtime":"nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }
Such that all the containers running on the GPU machines will default enable the Nvidia runtime.
Hope this can help someone facing the similar blocker
This is indeed what we do as well. It works for now, but it feels a little hacky to me. Hoping for full compose-v3 support soon. :)
Is it intended to have user manually populate
/etc/docker/daemon.json
after migrating to docker >=19.03
and removingnvidia-docker2
to usenvidia-container-toolkit
instead?It seems that this breaks a lot of installations. Especially, since
--gpus
is not available incompose
.
--gpus is not available in compose I can not use pycharm to link docker to run tensorflow-gpu
Any updates on this issue? Is there a chance that the --gpus will be supported in docker-compose soon?
For those of you looking for a workaround this what we ended up doing:
And then run COMPOSE_API_VERSION=auto docker-compose run gpu
with the following file:
version: '3.7'
services:
gpu:
image: 'nvidia/cuda:9.0-base'
command: 'nvidia-smi'
device_requests:
- capabilities:
- "gpu"
Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. docker/cli#1714 talk about this enablement.
Now one can simply pass --gpus option for GPU-accelerated Docker based application.
$ docker run -it --rm --gpus all ubuntu nvidia-smi Unable to find image 'ubuntu:latest' locally latest: Pulling from library/ubuntu f476d66f5408: Pull complete 8882c27f669e: Pull complete d9af21273955: Pull complete f5029279ec12: Pull complete Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981 Status: Downloaded newer image for ubuntu:latest Tue May 7 15:52:15 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.116 Driver Version: 390.116 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 | | N/A 39C P0 22W / 75W | 0MiB / 7611MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ :~$
As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.
I have solved this problems,you can have a try as follows, my csdn blog address: https://blog.csdn.net/u010420283/article/details/104055046
~$ sudo apt-get install nvidia-container-runtime ~$ sudo vim /etc/docker/daemon.json
then , in this daemon.json file, add this content:
{ "default-runtime": "nvidia" "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
~$ sudo systemctl daemon-reload ~$ sudo systemctl restart docker
For the ansible users who want to setup the workaround described before, there is a role to install nvidia-container-runtime and configure the /etc/docker/deamon.json to use runtime: nvidia
:
https://github.com/NVIDIA/ansible-role-nvidia-docker
(for some reason it runs only on Ubuntu and RHEL, but it's quite easy to modify. I run it on Debian)
Then in your docker-compose.yml:
version: "2.4"
services:
test:
image: "nvidia/cuda:10.2-runtime-ubuntu18.04"
command: "nvidia-smi"
any update on official 3.x version with gpu support? We need on swarm :)
Is there any plan to add this feature?
This feature depends on docker-py implementing the device_requests
parameters, which is what --gpus
translates to. There have been multiple pull requests to add this feature (https://github.com/docker/docker-py/pull/2419, https://github.com/docker/docker-py/pull/2465, https://github.com/docker/docker-py/pull/2471) but there are no reactions from any maintainer. #7124 uses https://github.com/docker/docker-py/pull/2471 to provide it in Compose, but still no reply from anyone.
As I mentioned in #7124 I'm more than happy to make the PR more compliant but since it's gotten very little attention I don't want to waste my time in something that's not going to be merged ...
Please add this feature, will be awesome!
Please, add this feature! I was more than happy with the old nevidia-docker2, which allowed me to change the runtime in the daemon.json. Would be extremely nice to have this back.
Need it, please. Really need it :/
I'd like to pile on as well... we need this feature!
I need to run both CPU and GPU containers on the same machine so the default runtime hack doesn't work for me. Do we have any idea when this will work on compose? Given that that we don't have the runtime flag in compose this represents a serious functionality regression, does it not? I'm having to write scripts in order to make this work - yuck!
I need to run both CPU and GPU containers on the same machine so the default runtime hack doesn't work for me. Do we have any idea when this will work on compose? Given that that we don't have the runtime flag in compose this represents a serious functionality regression, does it not? I'm having to write scripts in order to make this work - yuck!
you can do it by docker cli (docker run --gpu ....), i have this kind of trick (by adding a proxy, to be able to communicato with other containers running on other nodes on swarm). We are all waiting for the ability to run it on swarm, because it don't work by docker service command (as i know) nor by compose.
@dottgonzo . Well, yes ;-). I am aware of this and hence the reference to scripts. But this is a pretty awful and non-portable way of doing it so I'd like to do it in a more dynamic way. As I said, I think that this represents a regression, not a feature ask.
COMPOSE_API_VERSION=auto docker-compose run gpu
@ggregoire where do we run: COMPOSE_API_VERSION=auto docker-compose run gpu ?
@joehoeller from your shell just was you would do for any other command.
Right now we are deciding for every project if we need 3.x features or if we can use docker-compose 2.x where the GPU option is still supported. Features like running multistage targets from a Dockerfile can sadly not be used if GPU is necessary. Please add this back in!
I'd like to recommend something like an "additional options" field for docker-compose where we can just add flags like --gpus=all
to the docker start/run command, that are not yet/anymore supported in docker-compose but are in the latest docker version. This way, compose users won't have to wait for docker-compose to catch up if they need a new not yet supported docker feature.
Is still necessary to run this on Docker Swarm for production environments. Will this be useful por Docker Swarm?
@sebastianfelipe It's very useful if you want to deploy to your swarm using compose.
Compare:
docker service create --generic-resource "gpu=1" --replicas 10 \ --name sparkWorker <image_name> \"service ssh start && \ /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://<spark_master_ip>:7077\"
to something like this
docker stack deploy --compose-file docker-compose.yml stackdemo
@sebastianfelipe It's very useful if you want to deploy to your swarm using compose. Compare:
docker service create --generic-resource "gpu=1" --replicas 10 \ --name sparkWorker <image_name> \"service ssh start && \ /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://<spark_master_ip>:7077\"
to something like this
docker stack deploy --compose-file docker-compose.yml stackdemo
Sorry, so is it already working with Docker Swarm using the docker-compose yaml file? Just to be sure :O. Thanks!
only for docker compose 2.x
The entire point of this issue is to request nvidia-docker gpu support for docker-compose 3+
It's been almost a year since the original request!! Why the delay?? Can we move this forward ??
ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?
For those of you looking for a workaround this what we ended up doing:
- Install docker-py from this PR: docker/docker-py#2471
- Install docker-compose from this PR: #7124
And then run
COMPOSE_API_VERSION=auto docker-compose run gpu
with the following file:version: '3.7' services: gpu: image: 'nvidia/cuda:9.0-base' command: 'nvidia-smi' device_requests: - capabilities: - "gpu"
For those of you who are as impatient as I am, here's an easy pip install
version of the above workaround:
pip install git+https://github.com/docker/docker-py.git@refs/pull/2471/merge
pip install git+https://github.com/docker/compose.git@refs/pull/7124/merge
pip install python-dotenv
Huge kudos to @yoanisgil ! Still anxiously waiting for an official patch. With all the PRs in place, it doesn't seem difficult by any standard.
ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?
No, I don't know why I was called. I want you to tell me what to do?
I hope there is an update on this.
Yeah, it's been more than a year now... why are they not merging in docker-py...
I'm not sure that the proposed implementations are the right ones for the Compose format. The good news is that we've opened up the Compose format specification with the intention of adding things like this. You can find the spec at https://github.com/compose-spec.
What I'd suggest we do is add an issue on the spec and then discuss it at one of the upcoming Compose community meetings (link to invite at the bottom of this page).
This works:
docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
This does not:docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
You need to have
{ "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
in your
/etc/docker/daemon.json
for--runtime=nvidia
to continue working. More info here.
Dockerd doesn't start with this daemon.json
Christ, this is going to take years :@
This works:
docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
@deniswal : Yes, we know this, but we are asking about compose functionality.
@chris-crone: I'm confused: This represents a regression from former behavior, why does it need a new feature specification? Isn't it reasonable to run containers, some of which use GPU and some of which use CPU on the same physical box?
Thanks for the consideration.
@vk1z AFAIK Docker Compose has never had GPU support so this is not a regression. The part that needs design is how to declare a service's need for a GPU (or other device) in the Compose format– specifically changes like this. After that, it should just be plumbing to the backend.
Hi Guys, I've tried some solutions proposed here and nothing worked to me, for example @miriaford do not worked in my case, also is there some way to use GPU to run my existent docker containers? I've an i7 with 16GB of ram but the build for some projects takes too long to complete, my goal is to also use GPU power to speed up the process, is that possible? Thanks!
@chris-crone : Again, I will be willing to be corrected, but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression. But no, matter now since we all should be on 3.x anyway.
I'd be glad to file an issue, do we do that against the spec in the spec repo, correct?
but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression.
Yes, exactly. I have a couple of projects where we rely on using runtime: nvidia
in our docker-compose files, and this issue blocks us from upgrading to 3.x because we haven't found a way to use GPUs there.
Hi, please, please, please fix this. This should be marked mission critical priority -20
Again, I will be willing to be corrected, but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression. But no, matter now since we all should be on 3.x anyway.
I wasn't here when the change was made so I'm not 100 % sure why it was dropped. I know that you do not need the NVIDIA runtime to use GPUs any more and that we are evolving the Compose v3 spec in the open here with the intention of making a single version of the spec. This may mean moving some v2 functionality into v3.
In terms of the runtime
field, I don't think this is how it should be added to the Compose spec as it is very specific to running on a single node. Ideally we'd want something that'd allow you to specify that your workload has a device need (e.g.: GPU, TPU, whatever comes next) and then let the orchestrator assign the workload to a node that provides that capability.
This discussion should be had on the specification though as it's not Python Docker Compose specific.
@chris-crone: I mostly concur with your statement. Adding short term hacks is probably the incorrect way to do this since we have a proliferation of edge devices each with their own runtimes. For example, as you point out, TPU (Google), VPU(Intel) and ARM GPU on the Pi. So we do need a more complete story.
I'll file an issue against the specification today and update this thread once I have done so. However, I do think that the orchestrator should be independent - such as if I want to use Kube, I should be able to do so. I'm assuming that will be in scope.
I do however, disagree with the using GPUs statement, since that doesn't work with compose - which is what this is all about. But I think we all understand what problem we would like solved.
@chris-crone : Please see the docker-compose spec issue filed. I'll follow updates against that issue from now on.
Can we simply add an option (something like extra_docker_run_args
) to pass arguments directly to the underlying docker run
? This will not only solve the current problem, but also be future-proof: what if docker adds support for whatever "XPU", "YPU", or any other new features that might come in the future?
If we need a long back-and-forth discussion every time docker adds a new feature, it will be extremely inefficient and cause inevitable delay (and unnecessary confusion) between docker-compose
and docker
updates. Supporting argument delegation can provide temporary relief for this recurrent issue for all future features.
@miriaford I'm not sure that passing an uninterpreted blob supports the compose notion of being declarative. The old runtime tag at least indicated that it was something to do with the runtime. Given the direction in which docker is trending (docker-apps), it seems to me that doing this would make declarative deployment harder since an orchestrator would have to parse arbitrary blobs.
But I agree that compose and docker should be synchronized and zapping working features that people depend on (even though it was a major release) isn't quite kosher.
@vk1z I agree - there should be a much better sync mechanism between compose
and docker
. However, I don't expect such mechanism to be designed any time soon. Meanwhile we also need a temporary way to do our own synchronization without hacking deep into the source code.
If the argument delegation proposal isn't an option, what do we suggest we do? I agree it isn't a pretty solution, but it's at least much better than this workaround, isn't it? https://github.com/docker/compose/issues/6691#issuecomment-616984053
@miriaford docker-compose does not call the docker executive with argument, it actually uses the docker_py which uses the http API to the docker daemon. So there is no "underlying docker run
" command. The docker CLI is not an API, the socket connection is the API point of contact. This is why it is not always that easy.
To over simplify things, in the process of running a docker, there are two main calls, one that creates the container, and one that starts it, each ingest different pieces of information, and knowing which is while takes someone having API knowledge, which I don't know like I we tend to know the docker
CLI. I do not think being able to add extra args to docker_py calls is going to be as useful as you think, except in select use cases.
To make things even more difficult, sometimes the docker_py library is behind the API, and doesn't have everything you need right away either, and you have to wait for it to be updated. All that being said, extra_docker_run_args
isn't a simple solution.
@andyneff Thanks for your explanation. Indeed, I'm not too familiar with the inner workings of Docker. If I understand correctly, there are 4 APIs that need to be manually synced for any new feature updates:
docker_py
that provides python frontend to the socket APIThis begs the question: why is there no automatic (or at least semi-automatic) syncing mechanism? Manually propagating new feature updates across 4 APIs seems doomed to be error-prone, delay-prone, and confusing ...
P.S. I'm not saying that it's a simple task to have automatic syncing, but I really think there should be one to make life easier in the future.
I'm kinda getting into pedantics now... But as I would describe it as...
socat
)docker
CLI uses that API to give us users an awesome tool
So yes, it goes:
I can't speak for docker_py or compose, but I imagine they have limited man hours contributing to it, so it's harder to keep up with ALL the crazy insane docker features that docker is CONSTANTLY adding. But since docker is a go library, and my understanding is that python support is not (currently) a first class citizen. Although it is nice that both projects are under the docker umbrella, at least from a github organization stand point.
So that all being said... I too am waiting for an equivalent --gpus
support, and have to use the old runtime: nvidia
method instead, which will at least give me "a" path to move forward in docker-compose 2.x.
Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. https://github.com/docker/cli/pull/1714 talk about this enablement.
Now one can simply pass --gpus option for GPU-accelerated Docker based application.
As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.