Closed collabnix closed 3 years ago
@andyneff FYI there is Docker CLI support in the latest docker-compose. It allows using buildkit for instance. https://www.docker.com/blog/faster-builds-in-compose-thanks-to-buildkit-support/
@andyneff this is a very helpful overview! Thanks again
@lig awesome! Thanks for the correction! I was actually thinking "How will buildkit fit into all this" as I was writing that up
What I am a bit suprised by is that docker-compose is a pretty intrinsic part of the new docker-app framework and I'd imagine that they'd want to sync up docker-compose and docker for at least that reason. I wonder what the blocker really is: Not enough python bandwidth? Seems a bit unbelievable.
So how does Docker Swarm fit into the structure that @andyneff just described? Swarm uses the compose file format version 3 (defined by the "compose" project?) but is developed as part of docker
?
Apologies if that's off-topic for this particular issue. I've rather lost track of which issue is which but I started following this because I'd like to be able to tell a service running on a swarm that it needs to use a particular runtime. We can only do that with v2 of the compose-file spec which means we can't do it with Swarm which requires v3. In other words, I'm not really interested in what the docker-compose CLI does but only in the spec defined for docker-compose.yml files that are consumed by docker swarm.
Oh swarm, the one that got away... (from me). Unfortunately that is #6239 that got closed by a BOT. :( Someone tried in #6240 but was told that...
@miriaford, it looks like there is a PR for syncing them! #6642?! (Is this just for v3???)
So because of the nature of swarm, there are certain things you do and don't do on swarm nodes. So the Docker API doesn't always allow you to do the same options on a swarm run, as a normal run. I don't know if runtime is one of these things off hand, but that is often why you can't do things in v3 (the swarm compatible version) and can in v2 (the non-swarm compatible version).
No one reading this knows what you guys are talking about. We all are trying to deploy jellyfin w/ hardware acceleration. Until you guys fix this back to the way its suppose to be, when it says service, 3.x is no good. Dont use it.
You need to put 2.4 for service. Then you can use hardware acceleration for jellyfin, ez
So come on guys, whats the ETA on this, 1 year, 2 years?
@KlaasH @ulyssessouza @Goryudyuma @chris-crone Hi, I'm working on this issue, I found that the support was missing in "docker-py", have worked on that part. Now to get it working I need to pass the configs via docker-compose.yml file. Can you help me with the schema ? i.e In order to add it should I add it to a new schema or is there any place where the configs could be passed
@fakabbir I would assume it is ok to just use COMPOSE_DOCKER_CLI_BUILD
for this. Adding an ability to provide and arbitrary list of docker run
arguments could even help to avoid similar issues in the future.
@lig how do you deal when only one service requires access to a GPU?
@lig AFAICS compose uses docker-py
instead of the docker run
cli. So adding an arbitrary docker run
arguments wouldn't work unless docker-py
supports it as well.
ref: https://github.com/docker/compose/issues/6691#issuecomment-585199425
This single thing brings down the usefulness of docker-compose hugely for many people. That it hasn't seen much attention and desire to fix it, especially when it worked in older docker-compose, is quite astonishing. Wouldn't one way to go be to allow arbitrary docker --run arguments to be given in a docker-compose file? Then --gpus all for instance could be passed to docker.
I understand there can be philosophical or technical reasons why one might want to do it in a particular way. But not getting hands on and doing it in ANY way staggers the mind.
@lig how do you deal when only one service requires access to a GPU?
Well the environment variable NVIDIA_VISIBLE_DEVICES will allow you to control that no?
This single thing brings down the usefulness of docker-compose hugely for many people. That it hasn't seen much attention and desire to fix it, especially when it worked in older docker-compose, is quite astonishing. Wouldn't one way to go be to allow arbitrary docker --run arguments to be given in a docker-compose file? Then --gpus all for instance could be passed to docker.
I don't think to allow passing docker --run
args is the way to go. compose
does not really call docker
by itself but instead uses docker-py
.
I understand there can be philosophical or technical reasons why one might want to do it in a particular way. But not getting hands on and doing it in ANY way staggers the mind.
A PR is open about it: https://github.com/docker/compose/pull/7124. Please feel free to "get your hands on it".
I believe that as per change in docker compose spec, we should be back soon to earlier compatibility as per compose 2.4 and it the nvidia runtime will work. It obviously won't work for TPUs or other accelerators - which is very unfortunate but for those who want to run (expensive) nvidia gpus, it will work.
So just waiting on a green PR in docker-py to be merged https://github.com/docker/docker-py/pull/2471
YEAH! The PR over at docker-py has been approved! https://github.com/docker/docker-py/pull/2471 What the next step here?
What's up here ? It would be cool to be able to support nvidia runtime in docker-compose
https://github.com/docker/docker-py/pull/2471 has been merged.
Now that docker/docker-py#2471 has been merged we can install the docker-py from master. But since the docker-compose has changed since @yoanisgil 's cool [PR] (https://github.com/docker/compose/pull/7124) (Kudos!), it is unlikely to get merged. So at this point, the docker-compose can be installed from that PR to save the day.
For those who ended up here without seeing the previous comments:
pip install git+https://github.com/docker/docker-py.git
pip install git+https://github.com/yoanisgil/compose.git@device-requests
Then use the following template in your compose file. (source: comment):
And then run
COMPOSE_API_VERSION=auto docker-compose run gpu
with the following file:version: '3.7' services: gpu: image: 'nvidia/cuda:9.0-base' command: 'nvidia-smi' device_requests: - capabilities: - "gpu"
I confirm that this worked on my local machine. Don't know it works with Swarm.
Can't have a particular commit of docker-compose in production. Does #7124 need to be rebased or is there another PR thats going to incorporate the new docker-py
?
Hi there @bkakilli,
Thanks for the help! I just tried your suggestion, but I get an error running my docker-compose
ERROR: The Compose file './docker-compose.yml' is invalid because:
Unsupported config option for services.analysis: 'device_requests'
analysis being my container's name
I changed my docker-compose.yml
from:
version: '2.3'
services:
analysis:
container_name: analysis
image: analysis:${TAG}
runtime: nvidia
restart: always
ports:
- "8000:80"
to:
version: '3.7'
services:
analysis:
container_name: analysis
image: analysis:${TAG}
device_requests:
- capabilities:
- "gpu"
restart: always
ports:
- "8000:80"
Is there anything else apart from both pip install git+
to correctly set this up? Or perhaps I edited the configuration file badly?
@frgfm make sure you're installing compose and docker-py from correct links. You may have used the docker-compose's own repo instead of yoanisgil's fork (and branch). See if you're using the following link:
pip install git+https://github.com/yoanisgil/compose.git@device-requests
You may try putting --upgrade
param to pip install. Otherwise I would suspect the virtual environment settings. Maybe you have another docker-compose installation, which is being used by default? E.g you may have installed it for the system with the "Linux" instructions here: https://docs.docker.com/compose/install/. I suggest you to take a look at "Alternative Install Options" and installing via pip in the virtual environment (but use pip install command above. Don't install the default docker-compose from PyPI).
Hi!
Thanks for all the info. I was trying to run your approach @bkakilli and docker-compose build
worked but when running docker-compose up
I got the error:
docker.errors.InvalidVersion: device_requests param is not supported in API versions < 1.40
My docker_compose.yml looks like this:
version: '3.7'
networks:
isolation-network:
driver: bridge
services:
li_t5_service:
build: .
ports:
- "${GRAPH_QL_API_PORT}:5001"
device_requests:
- capabilities:
- "gpu"
environment:
- SSH_PRIVATE_KEY=${SSH_PRIVATE_KEY}
- PYTHONUNBUFFERED=${PYTHONUNBUFFERED}
networks:
- isolation-network
Thanks in advance!
@ugmSorcero Set the environment variable COMPOSE_API_VERSION=1.40
then re-run your commands
@ugmSorcero did you manage to fix that error? @EpicWink @bkakilli I'm running the version stated from the pip install but I still get the error for device_requests param is not supported in API versions < 1.40
even if I export such variable to 1.40
For the given compose file
version: "3.7"
services:
spam:
image: nvidia/cuda:10.1-cudnn7-runtime
command: nvidia-smi
device_requests:
- capabilities:
- gpu
Using the version of docker-compose
installed as above, in Bash on Linux, the following command succeeds:
COMPOSE_API_VERSION=1.40 docker-compose up
The following command fails:
docker-compose up
This has error output:
ERROR: for tmp_spam_1 device_requests param is not supported in API versions < 1.40
...
docker.errors.InvalidVersion: device_requests param is not supported in API versions < 1.40
@EpicWink thank you very much. I didn't realize that docker-compose up
had to be executed that way. I took it as a 2 step where first I exported COMPOSE_API_VERSION
separately. Running it together seems to work :)
I have another issue, though. If I run COMPOSE_API_VERSION=1.40 docker-compose run nvidiatest
then nvidia-smi
is not found in the path, while if I run directly from the image there is no issue.
Here's how I'm reproducing it.
docker-compose local file contains:
nvidiatest:
image: nvidia/cuda:10.0-base
device_requests:
- capabilities:
- gpu
command: nvidia-smi
If I run my current setup (both api version auto and 1.40) I get the following error:
COMPOSE_API_VERSION=auto docker-compose -f docker-compose.yml -f docker-compose.local.yml run nvidiatest
Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown
Is it possible that it has to do with using override files? If I just run the cuda base image with Docker there's no problem with getting output from nvidia-smi
:
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
Mon Aug 24 11:40:04 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:29:00.0 On | N/A |
| 0% 46C P8 19W / 175W | 427MiB / 7974MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
I installed docker-compose
following the instructions above from git after uninstalling the version installed from the official docs. Here's the info of the version installed:
pip3 show --verbose docker-compose
Name: docker-compose
Version: 1.26.0.dev0
Summary: Multi-container orchestration for Docker
Home-page: https://www.docker.com/
Author: Docker, Inc.
Author-email: None
License: Apache License 2.0
Location: /home/jurugu/.local/lib/python3.8/site-packages
Requires: docopt, docker, requests, PyYAML, texttable, websocket-client, six, dockerpty, jsonschema, cached-property
Required-by:
Metadata-Version: 2.1
Installer: pip
Classifiers:
Development Status :: 5 - Production/Stable
Environment :: Console
Intended Audience :: Developers
License :: OSI Approved :: Apache Software License
Programming Language :: Python :: 2
Programming Language :: Python :: 2.7
Programming Language :: Python :: 3
Programming Language :: Python :: 3.4
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Entry-points:
[console_scripts]
docker-compose = compose.cli.main:main
Am I missing anything? Thanks for the help!
@jjrugui this is becoming off-topic, and I'm not able to replicate your issue. Sorry for not being able to help
@EpicWink not a problem, and sorry for deviating from the topic :). If I figure out my particular issue I'll post it here if it's relevant.
Is someone working on another PR or are we debugging the device-requests
branch in order to get ready for a PR?
While the PR is stuck, I ported changes from #7124 to the latest version from the master
branch to match dependencies, etc. - https://github.com/beehiveai/compose You can install with pip install git+https://github.com/beehiveai/compose.git
and change the version in docker-compose.yml
to 3.8
:
version: "3.8"
services:
gpu-test:
image: nvidia/cuda:10.2-runtime
command: nvidia-smi
device_requests:
- capabilities:
- gpu
In this setting, everything works as expected.
As discussed yesterday on compose-spec governance meeting, we will start working on a proposal to adopt something comparable to #7124, which could be close to generic_resouces
already available on deploy
section.
@ndeloof That is great! If it is possible, please post the link to the proposal here. I think many people would be happy to contribute to this since GPU support is critical for deep learning deployments.
@ndeloof historically, how long does it take the steering committee to make a decision, 6 months, a year?
+1
+1
@visheratin Any chance you can improve your fix so that it works when using multiple compose yml files? I have a base docker-compose.yml that uses a non-nvidia container, that I want to override with nvidia container when there is a GPU, however it seems that with your fix, if I specify multiple compose yml files with the "-f", the "device_requests" fields drops out of the config.
@proximous What do you mean by "drops out of the config"? Do all compose files have version 3.8? Can you share the example so it would be easier to reproduce?
Having a problem with the code in compose/service.py when trying to use the --scale option with docker-compose up. Is this not supported?
Traceback (most recent call last):
File "/usr/local/bin/docker-compose", line 11, in
After further debugging, I found that when using the --scale, that for some reason one instance has the device_requests['capabilities'] as ['gpu']. But for all other containers to be started, the device_request['capabilities'] instead looks like [['gpu']].
I made a temporary fix locally to get around this issue just to get my containers up and running starting at line 1010 in compose/service.py:
for device_request in device_requests:
if 'capabilities' not in device_request:
continue
if type(device_request['capabilities'][0]) == list:
device_request['capabilities'] = [
element.split('.') for element in device_request['capabilities'][0]]
else:
device_request['capabilities'] = [
element.split('.') for element in device_request['capabilities']]
@proximous What do you mean by "drops out of the config"? Do all compose files have version 3.8? Can you share the example so it would be easier to reproduce?
@visheratin see this example, am I wrong to expect a different result?
docker-compose.nogpu.yml:
version: '3.8'
services:
df:
build: miniconda-image.Dockerfile
docker-compose.gpu.yml:
version: '3.8'
services:
df:
build: nvidia-image.Dockerfile
device_requests:
- capabilities:
- gpu
use only the nogpu.yml:
$ docker-compose -f docker-compose.nogpu.yml config
services:
df:
build:
context: /home/jerry/gpu-test/miniconda-image.Dockerfile
version: '3'
use only the gpu.yml:
$ docker-compose -f docker-compose.gpu.yml config
services:
df:
build:
context: /home/jerry/gpu-test/nvidia-image.Dockerfile
device_requests:
- capabilities:
- gpu
version: '3'
chain config ymls starting with a non-gpu yml (note...missing the runtime):
$ docker-compose -f docker-compose.nogpu.yml -f docker-compose.gpu.yml config
services:
df:
build:
context: /home/jerry/gpu-test/nvidia-image.Dockerfile
version: '3'
expected output:
$ docker-compose -f docker-compose.nogpu.yml -f docker-compose.gpu.yml config
services:
df:
build:
context: /home/jerry/gpu-test/nvidia-image.Dockerfile
device_requests:
- capabilities:
- gpu
version: '3'
(Obviously I'm trying to something more elaborate and this is just a simplified case to highlight the unexpected behavior.)
@jlaule @proximous In order to keep this thread on topic, please create issues in the forked repo, I will look into them when I have time.
For those who need something while waiting, i just setup K3S (edge version of Kubernetes) with GPU support in 30mins using docker as a container run time (i.e. use the --docker
option to the install script). Follow https://github.com/NVIDIA/k8s-device-plugin to get the Nvidia device plugin working.
Hope that helps!
@EpicWink not a problem, and sorry for deviating from the topic :). If I figure out my particular issue I'll post it here if it's relevant.
Did you ever resolve this?
There is no such thing like "/usr/bin/nvidia-container-runtime" anymore. Issue is still critical.
Install nvidia-docker2 as instruceted here
ive been tackling This lately and thought id share my approach. my problem was that i needed to docker stack deploy and it didnt want to listen. docker compose i had working with the docker api version hack but it didnt feel right and stack deploy wouldnt work regardless.
so without setting any run time pr device requests in my docker compose, i added This to my daemon:
{ "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia", "node-generic-resources": [ "NVIDIA-GPU=0" ] }
u can also use GPU-{first part of gpu guid} but This was easier. didnt have to install any pip+ or anything like that except the NV container toolkit. it deploys and works like a charm.
Tks a lot @haviduck , just tried on my own machine (Ubuntu 20.04, docker CE 19.03.8) and it worked like a charm. For others: don't forget to restart your docker daemon.
@pommedeterresautee ah im so glad it worked for others! should have mentioned the reload.
gotta say after 3 weeks of non stop dockering im pretty baffled how nothing seems to work..
@haviduck: Thank you! Finally a simple solution that just works. I have spent so much time trying to add devices etc that I gave up. Then this comes along, tries it and after a couple of minutes I have hardware transcoding in Plex working.
Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. https://github.com/docker/cli/pull/1714 talk about this enablement.
Now one can simply pass --gpus option for GPU-accelerated Docker based application.
As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.