MIDS-scaling-up / v2

W251 2018 reload
75 stars 114 forks source link

w251/digits:tx2-4.2_b158 Error on Jetson Nano #20

Open DennisFaucher opened 5 years ago

DennisFaucher commented 5 years ago

(I could not find your container in GitHub, so I am posting here) Thank you for this container image. I was able to start the DIGITS server on my NVIDIA Jetson Nano. Image load works fine, but attempting to create an Image Classification model leads to these errors. Looks like a CUDA issue. TIA.

2019-05-25 20:36:31 [20190525-203625-5f68] [INFO ] Create DB (train) task started. 2019-05-25 20:36:31 [20190525-203625-5f68] [INFO ] Task subprocess args: "/usr/bin/python2 /DIGITS/digits/tools/create_db.py /DIGITS/digits/jobs/20190525-203625-5f68/train.txt /DIGITS/digits/jobs/20190525-203625-5f68/train_db 768 1272 --backend=lmdb --channels=3 --resize_mode=crop --mean_file=/DIGITS/digits/jobs/20190525-203625-5f68/mean.binaryproto --mean_file=/DIGITS/digits/jobs/20190525-203625-5f68/mean.jpg --shuffle --encoding=jpg" 2019-05-25 20:36:31 [20190525-203625-5f68] [INFO ] Create DB (val) task started. 2019-05-25 20:36:31 [20190525-203625-5f68] [INFO ] Task subprocess args: "/usr/bin/python2 /DIGITS/digits/tools/create_db.py /DIGITS/digits/jobs/20190525-203625-5f68/val.txt /DIGITS/digits/jobs/20190525-203625-5f68/val_db 768 1272 --backend=lmdb --channels=3 --resize_mode=crop --shuffle --encoding=jpg" 2019-05-25 20:36:34 [20190525-203625-5f68] [WARNING] Create DB (train) unrecognized output: cudaRuntimeGetVersion() failed with error #38 2019-05-25 20:36:34 [20190525-203625-5f68] [WARNING] Create DB (train) unrecognized output: Tensorflow support disabled. 2019-05-25 20:36:34 [20190525-203625-5f68] [WARNING] Create DB (val) unrecognized output: cudaRuntimeGetVersion() failed with error #38 2019-05-25 20:36:34 [20190525-203625-5f68] [WARNING] Create DB (val) unrecognized output: Tensorflow support disabled. 2019-05-25 20:36:39 [20190525-203625-5f68] [DEBUG] 81 images written to database 2019-05-25 20:36:39 [20190525-203625-5f68] [INFO ] Create DB (val) task completed. 2019-05-25 20:36:46 [20190525-203625-5f68] [DEBUG] 246 images written to database 2019-05-25 20:37:06 [20190525-203625-5f68] [INFO ] Create DB (train) task completed. 2019-05-25 20:37:06 [20190525-203625-5f68] [INFO ] Job complete. 2019-05-25 20:38:57 [20190525-203823-f70d] [DEBUG] Network sanity check - train 2019-05-25 20:38:57 [20190525-203823-f70d] [DEBUG] Network sanity check - val 2019-05-25 20:38:57 [20190525-203823-f70d] [DEBUG] Network sanity check - deploy 2019-05-25 20:38:57 [20190525-203823-f70d] [INFO ] Train Caffe Model task started. 2019-05-25 20:38:57 [20190525-203823-f70d] [INFO ] Task subprocess args: "/caffe/build/tools/caffe train --solver=/DIGITS/digits/jobs/20190525-203823-f70d/solver.prototxt" 2019-05-25 20:38:58 [20190525-203823-f70d] [ERROR] Train Caffe Model: Cannot create Cublas handle. Cublas won't be available. 2019-05-25 20:38:58 [20190525-203823-f70d] [ERROR] Train Caffe Model: Cannot create Curand generator. Curand won't be available. 2019-05-25 20:38:58 [20190525-203823-f70d] [ERROR] Train Caffe Model: Cannot create cuDNN handle. cuDNN won't be available. 2019-05-25 20:38:58 [20190525-203823-f70d] [ERROR] Train Caffe Model: Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected 2019-05-25 20:38:59 [20190525-203823-f70d] [ERROR] Train Caffe Model task failed with error code -6

DennisFaucher commented 5 years ago

[Edit] You can close this issue. nvidia-docker is required for CUDA and not supported on the Nano.