Want to use nvidia-docker plugin to run the custom stack image in eclipse-che

eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams

http://eclipse.org/che

Eclipse Public License 2.0

6.99k stars 1.19k forks source link

Want to use nvidia-docker plugin to run the custom stack image in eclipse-che #6263

Closed kayush2O6 closed 6 years ago

kayush2O6 commented 7 years ago

Hi, I have a docker image with preinstalled cuda library on ubuntu 14.04. When I run that image using docker, I am not able to run nvidia-smi command but when I run that image using nvidia-docker, I am able to run the nvidia-smi command. So the problem is that I have created a custom runtime stack in eclipse che with that image. And eclipse che run that image using docker (not nvidia-docker), so not able to run any nvidia cuda commands.

Reproduction Steps: sudo nvidia-docker run -p 8080:8080 \ -e CHE_DOCKER_MACHINE_HOST_EXTERNAL= \ --name che \ --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /home/user/data:/data \ eclipse/che-server:5.0.0-latest

OS and version:
Host OS : UBUNTU 16.04 Image OS: UBUNTU 14.04 CUDA : 8.0.61 DOCKER : 1.13.1 eclipse che : 5.5.0

Diagnostics: When I run the nvidia-smi command in eclipse che, It produced following output. command: nvidia-smi [dev-machine] /bin/bash: nvidia-smi: command not found [STDERR] /bin/bash: nvidia-smi: command not found

garagatyi commented 7 years ago

Hi. I'm one of the maintainers responsible for support of docker in Eclipse Che. To be honest, I'm not familiar with nvidia-docker at the time. Can you elaborate on is it an extension of the origin docker? What is the difference between them? Does it use special images with nvidia SDK installed in a docker image?

kayush2O6 commented 7 years ago

Thanks for your interest @garagatyi A good documentation for motivation of nvidia-docker can be found here. nvidia-docker github It does not use any special image. It uses same image as native docker but it helps us to deploy GPU based application on specialized hardware i.e. NVIDIA-GPUs. So basically nnvidia-docker helps in containerisation of GPU application while native docker only capable of containerisation of CPU based application.

In the eclipse che, I am trying to create a runtime stack in order to run the CUDA C/C++ programs. hopefully, this feature can be added to eclipse che for the other users in the later versions.

And currently I am using kayush206/cuda-devel image to create the custom cuda stack.

garagatyi commented 7 years ago

If you are able to expose TCP port of nvidia-docker to your host you can use environment variable CHE_DOCKER_DAEMON__URL in che.env to set URL of TCP port of nvidia-docker. In that case, I think, it should work as you are inspecting. But, sure, if nvidia-docker behaves not exactly in the same way as regular docker do then you might get some challenges. You can try this approach and comment if it goes well or not.

kayush2O6 commented 7 years ago

Hi @garagatyi, I tried with your approach but i couldn't solve my problem. Because nvidia-docker passes some additional (external hardware detecting) arguments to docker along with the image and start the default docker daemon. So, even if we expose the nvidia-docker daemon, it will eventually connect to default docker daemon.

nvidia-docker --rm -it kayush206/cuda-devel

is same as--

docker --rm -it \ --volume-driver=nvidia-docker \ --volume=nvidia_driver_375.82:/usr/local/nvidia:ro \ --device=/dev/nvidiactl \ --device=/dev/nvidia-uvm \ --device=/dev/nvidia-uvm-tools \ --device=/dev/nvidia0 \ --device=/dev/nvidia1 \ --device=/dev/nvidia2 \ --device=/dev/nvidia3 \ kayush206/cuda-devel

If somehow I can pass this information with custom stack image, then hopefully the problem will be solved. I also read about CHE_WORKSPACE_VOLUME argument in che.env file but with that I was
able to pass only the volume not the device information.

garagatyi commented 7 years ago

If you are willing to be a contributor I believe you can implement this feature yourself. Here are useful links on how to implement injection of devices: 1 2 3 Here is a link that shows how to implement volume driver: link How to add the volume driver field into docker container: Docker API where field is described Che class that needs to be updated

ghost commented 7 years ago

@AK-ayush you can use your custom image to start a workspace.

kayush2O6 commented 7 years ago

Thanks @garagatyi @eivantsov I have found a workaround to resolve my issue. I will try to implement the feature as suggested by @garagatyi .

ghost commented 6 years ago

@AK-ayush I am closing the issue due to inactivity. Feel free to reopen if you have questions or just want to share some findings.