galliot-us / neuralet

Neuralet is an open-source platform for edge deep learning models on edge TPU, Jetson Nano, and more.
https://neuralet.com
Apache License 2.0
238 stars 71 forks source link

Unable to run x86 docker #71

Closed archanabwk closed 4 years ago

archanabwk commented 4 years ago

Hi,

I have created docker image as mentioned here https://github.com/neuralet/neuralet/tree/master/applications/smart-distancing for x86, but after creating the image, the docker run is not working.

I'm getting error:

Unable to find image 'neuralnet/x86_64:applications-smart-distancing' locally
docker: Error response from daemon: pull access denied for neuralnet/x86_64, repository does not exist or may require 'docker login': denied: requested access to the resource is denied. 

I have cloned the neuralnet repo and tried docker commands for x86 on Ubuntu 18.04. Are there any other steps to make this work? Or any prerequisites for x86?

Here is my machine configuration:

OS: Ubuntu 18.04 GPU: Nvidia GeForce GTX 1050 Ti

Please help.

Thanks, Archana

mhejrati commented 4 years ago

@archanabwk

What do you see when you run docker images in the terminal: docker images

If the container is built successfully, you should see something like this: image

archanabwk commented 4 years ago

@mhejrati

Yes, I can see the neuralnet/x86_64 container listed under docker images.

archanabwk commented 4 years ago

@mhejrati now I created the docker again and at least docker related errors are gone now. But now I'm getting below errors:

020-05-12 09:22:04.843514: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-05-12 09:22:04.843589: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-05-12 09:22:04.843598: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Downloading data from http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
187932672/187925923 [==============================] - 354s 2us/step
2020-05-12 09:28:03.342859: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-05-12 09:28:03.342881: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2020-05-12 09:28:03.342902: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2020-05-12 09:28:03.343110: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-12 09:28:03.364595: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-05-12 09:28:03.364791: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xad97600 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-12 09:28:03.364804: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Device is:  x86
Detector is:  mobilenet_ssd_v2
image size:  [300, 300, 3]
 * Serving Flask app "ui.web_gui" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
opened video  /repo/applications/smart-distancing/data/TownCentreXVID.avi

And the above link http://0.0.0.0:8000/ never works, I tried to specify local IP and other ports but still the link is always same - http://0.0.0.0:8000/.

I have a Nvidia GPU: Geforce GTX 1050 Ti and OS Ubuntu 18.04, still above error says,

no NVIDIA GPU device is present: /dev/nvidia0 does not exist

I tried to find your website and Github documents for any specific version requirements for CUDA, TensorRT, CuDNN and TensorFlow or any specific compute capability GPU requirements, but couldn't find any. I checked the 'Dockerfile-x86' file for the package installations, but couldn't find any libs related to TensorRT, CUDA, Nvidia Drivers, etc.

Can you please help me to understand make this app work on x86?

alpha-carinae29 commented 4 years ago

Hi @archanabwk would you please tell what command did you run for running this docker image? Also this container only works with CPU and you don't need to have CUDA or CuDNN to run it.

archanabwk commented 4 years ago

@alpha-carinae29 If this application works with CPU only, then we can ignore GPU related errors. I used few commands as below:

docker run -it -p 8000:80 -v /home/archana/neuralet/:/repo neuralet/x86_64:applications-smart-distancing
docker run -it -v /home/archana/neuralet/:/repo neuralet/x86_64:applications-smart-distancing
alpha-carinae29 commented 4 years ago

OK, It seems you set wrong port. please try this: docker run -it -p 8000:8000 -v /PATH_TO_CLONED_REPO_ROOT/:/repo neuralet/x86_64:applications-smart-distancing and then open your browser and browse http://localhost:8000/

JsonSadler commented 4 years ago

-p 8000:80

if you are using this option in your docker run, you are mapping your host's port 8000 to container's port 80, which does not make sense. The output you are seeing in the terminal is docker's, meaning the app is running on container's 0.0.0.0:8000. This is why in the instructions, you can see docker's option -p is used like -p HOST_PORT:8000, meaning that if you want to see the output on 127.0.0.0:80 of your host, you have to run the docker command with -p 80:8000.

Let me know if you could get it to work.

JsonSadler commented 4 years ago

OK, It seems you set wrong port. please try this: docker run -it -p 8000:8000 -v /PATH_TO_CLONED_REPO_ROOT/:/repo neuralet/x86_64:applications-smart-distancing and then open your browser and browse http://localhost:8000/

lol I was writing when you were sending your answer :D

mdegans commented 4 years ago
/dev/nvidia0

you will need --gpus all to pass the gpus to docker. You'll also need the nvidia-docker2 runtime installed for that option to work. More details on the --gpu option here.

mhejrati commented 4 years ago

@archanabwk as @mdegans mentioned above, it seems like the docker can't see the GPU device. Depending on which docker version you are using you should make sure to mount the GPU and nvidia libraries properly, either with --runtime nvidia or --gpus all flags.

archanabwk commented 4 years ago

Thank you @alpha-carinae29 , @JsonSadler , @mdegans and @mhejrati After applying all your solutions, I used below command and finally it worked :+1:

docker run -it --runtime nvidia --privileged -p 8000:8000 -v /home/archana/neuralet/:/repo neuralet/x86_64:applications-smart-distancing

I appreciate your quick responses and appropriate solutions :+1:

mdegans commented 4 years ago

Thank you @alpha-carinae29 , @JsonSadler , @mdegans and @mhejrati After applying all your solutions, I used below command and finally it worked

docker run -it --runtime nvidia --privileged -p 8000:8000 -v /home/archana/neuralet/:/repo neuralet/x86_64:applications-smart-distancing

Glad you got it working.

You should not need the --privileged option if you pass gpus through. You can even run with the --user flag to run as a limited uid/gid if you prefer.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

--gpus all is for x86 and old nvidia-docker versions. --runtime nvidia is for Tegra, however it might still work on x86. I haven't tested --runtime nvidia on x86 in a while.

mohammad7t commented 4 years ago

@mhejrati Please notice that the latest-py3 tensorflow image doesn't contain CUDA and uses CPU only. I suggest replacing it with latest-gpu-py3.

https://github.com/neuralet/neuralet/blob/8b1d6fb811225dcdf4485074c1897741445e6d5c/applications/smart-distancing/Dockerfile-x86#L1