Closed kayush2O6 closed 6 years ago
Hi. I'm one of the maintainers responsible for support of docker in Eclipse Che. To be honest, I'm not familiar with nvidia-docker at the time. Can you elaborate on is it an extension of the origin docker? What is the difference between them? Does it use special images with nvidia SDK installed in a docker image?
Thanks for your interest @garagatyi A good documentation for motivation of nvidia-docker can be found here. nvidia-docker github It does not use any special image. It uses same image as native docker but it helps us to deploy GPU based application on specialized hardware i.e. NVIDIA-GPUs. So basically nnvidia-docker helps in containerisation of GPU application while native docker only capable of containerisation of CPU based application.
In the eclipse che, I am trying to create a runtime stack in order to run the CUDA C/C++ programs. hopefully, this feature can be added to eclipse che for the other users in the later versions.
And currently I am using kayush206/cuda-devel image to create the custom cuda stack.
If you are able to expose TCP port of nvidia-docker to your host you can use environment variable CHE_DOCKER_DAEMON__URL
in che.env to set URL of TCP port of nvidia-docker. In that case, I think, it should work as you are inspecting. But, sure, if nvidia-docker behaves not exactly in the same way as regular docker do then you might get some challenges.
You can try this approach and comment if it goes well or not.
Hi @garagatyi, I tried with your approach but i couldn't solve my problem. Because nvidia-docker passes some additional (external hardware detecting) arguments to docker along with the image and start the default docker daemon. So, even if we expose the nvidia-docker daemon, it will eventually connect to default docker daemon.
nvidia-docker --rm -it kayush206/cuda-devel
is same as--
docker --rm -it \
--volume-driver=nvidia-docker \
--volume=nvidia_driver_375.82:/usr/local/nvidia:ro \
--device=/dev/nvidiactl \
--device=/dev/nvidia-uvm \
--device=/dev/nvidia-uvm-tools \
--device=/dev/nvidia0 \
--device=/dev/nvidia1 \
--device=/dev/nvidia2 \
--device=/dev/nvidia3 \
kayush206/cuda-devel
If somehow I can pass this information with custom stack image, then hopefully the problem will be solved. I also read about CHE_WORKSPACE_VOLUME
argument in che.env file but with that I was
able to pass only the volume not the device information.
If you are willing to be a contributor I believe you can implement this feature yourself. Here are useful links on how to implement injection of devices: 1 2 3 Here is a link that shows how to implement volume driver: link How to add the volume driver field into docker container: Docker API where field is described Che class that needs to be updated
@AK-ayush you can use your custom image to start a workspace.
Thanks @garagatyi @eivantsov I have found a workaround to resolve my issue. I will try to implement the feature as suggested by @garagatyi .
@AK-ayush I am closing the issue due to inactivity. Feel free to reopen if you have questions or just want to share some findings.
Hi, I have a docker image with preinstalled cuda library on ubuntu 14.04. When I run that image using docker, I am not able to run nvidia-smi command but when I run that image using nvidia-docker, I am able to run the nvidia-smi command. So the problem is that I have created a custom runtime stack in eclipse che with that image. And eclipse che run that image using docker (not nvidia-docker), so not able to run any nvidia cuda commands.
Reproduction Steps: sudo nvidia-docker run -p 8080:8080 \ -e CHE_DOCKER_MACHINE_HOST_EXTERNAL= \
--name che \
--rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /home/user/data:/data \
eclipse/che-server:5.0.0-latest
OS and version:
Host OS : UBUNTU 16.04 Image OS: UBUNTU 14.04 CUDA : 8.0.61 DOCKER : 1.13.1 eclipse che : 5.5.0
Diagnostics: When I run the nvidia-smi command in eclipse che, It produced following output. command: nvidia-smi [dev-machine] /bin/bash: nvidia-smi: command not found [STDERR] /bin/bash: nvidia-smi: command not found