Open OxInsky opened 7 years ago
You've got some CUDA toolkit and/or driver issues. Can you make a standard CUDA sample?
What do these commands tell you?
nvidia-smi
./digits/device_query.py
dpkg -l | egrep 'nvidia|cudart|libcudnn|libnccl|caffe|torch|digits'
Sun Dec 11 10:54:07 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:02:00.0 On | N/A |
| 22% 32C P8 16W / 250W | 289MiB / 12204MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:03:00.0 Off | N/A |
| 22% 33C P8 14W / 250W | 3MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX TIT... Off | 0000:82:00.0 Off | N/A |
| 22% 32C P8 14W / 250W | 3MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX TIT... Off | 0000:83:00.0 Off | N/A |
| 22% 33C P8 14W / 250W | 3MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1235 C /usr/bin/python 106MiB |
| 0 1682 G /usr/bin/X 94MiB |
| 0 3019 G compiz 85MiB |
+-----------------------------------------------------------------------------+
Device #0: GeForce GTX TITAN X
totalGlobalMem 12797476864
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
clockRate 1076000
totalConstMem 65536
major 5
minor 2
textureAlignment 512
texturePitchAlignment 32
deviceOverlap 1
multiProcessorCount 24
kernelExecTimeoutEnabled 1
integrated 0
canMapHostMemory 1
computeMode 0
maxTexture1D 65536
maxTexture1DMipmap 16384
maxTexture1DLinear 134217728
maxTextureCubemap 16384
maxSurface1D 16384
maxSurfaceCubemap 16384
surfaceAlignment 512
concurrentKernels 1
ECCEnabled 0
pciBusID 2
pciDeviceID 0
pciDomainID 0
tccDriver 0
asyncEngineCount 2
unifiedAddressing 1
memoryClockRate 3505000
memoryBusWidth 384
l2CacheSize 3145728
maxThreadsPerMultiProcessor 2048
streamPrioritiesSupported 1
globalL1CacheSupported 1
localL1CacheSupported 1
sharedMemPerMultiprocessor 98304
regsPerMultiprocessor 65536
managedMemSupported 1
isMultiGpuBoard 0
multiGpuBoardGroupID 0
Total memory (NVML) 12204 MB
Used memory (NVML) 396 MB
Free memory (NVML) 11808 MB
GPU utilization (NVML) 0%
Memory utilization (NVML) 0%
Temperature (NVML) 32 C
Device #1: GeForce GTX TITAN X
totalGlobalMem 12799574016
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
clockRate 1076000
totalConstMem 65536
major 5
minor 2
textureAlignment 512
texturePitchAlignment 32
deviceOverlap 1
multiProcessorCount 24
kernelExecTimeoutEnabled 0
integrated 0
canMapHostMemory 1
computeMode 0
maxTexture1D 65536
maxTexture1DMipmap 16384
maxTexture1DLinear 134217728
maxTextureCubemap 16384
maxSurface1D 16384
maxSurfaceCubemap 16384
surfaceAlignment 512
concurrentKernels 1
ECCEnabled 0
pciBusID 3
pciDeviceID 0
pciDomainID 0
tccDriver 0
asyncEngineCount 2
unifiedAddressing 1
memoryClockRate 3505000
memoryBusWidth 384
l2CacheSize 3145728
maxThreadsPerMultiProcessor 2048
streamPrioritiesSupported 1
globalL1CacheSupported 1
localL1CacheSupported 1
sharedMemPerMultiprocessor 98304
regsPerMultiprocessor 65536
managedMemSupported 1
isMultiGpuBoard 0
multiGpuBoardGroupID 1
Total memory (NVML) 12206 MB
Used memory (NVML) 3 MB
Free memory (NVML) 12203 MB
GPU utilization (NVML) 0%
Memory utilization (NVML) 0%
Temperature (NVML) 33 C
Device #2: GeForce GTX TITAN X
totalGlobalMem 12799574016
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
clockRate 1076000
totalConstMem 65536
major 5
minor 2
textureAlignment 512
texturePitchAlignment 32
deviceOverlap 1
multiProcessorCount 24
kernelExecTimeoutEnabled 0
integrated 0
canMapHostMemory 1
computeMode 0
maxTexture1D 65536
maxTexture1DMipmap 16384
maxTexture1DLinear 134217728
maxTextureCubemap 16384
maxSurface1D 16384
maxSurfaceCubemap 16384
surfaceAlignment 512
concurrentKernels 1
ECCEnabled 0
pciBusID 130
pciDeviceID 0
pciDomainID 0
tccDriver 0
asyncEngineCount 2
unifiedAddressing 1
memoryClockRate 3505000
memoryBusWidth 384
l2CacheSize 3145728
maxThreadsPerMultiProcessor 2048
streamPrioritiesSupported 1
globalL1CacheSupported 1
localL1CacheSupported 1
sharedMemPerMultiprocessor 98304
regsPerMultiprocessor 65536
managedMemSupported 1
isMultiGpuBoard 0
multiGpuBoardGroupID 2
Total memory (NVML) 12206 MB
Used memory (NVML) 3 MB
Free memory (NVML) 12203 MB
GPU utilization (NVML) 0%
Memory utilization (NVML) 0%
Temperature (NVML) 33 C
Device #3: GeForce GTX TITAN X
totalGlobalMem 12799574016
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
clockRate 1076000
totalConstMem 65536
major 5
minor 2
textureAlignment 512
texturePitchAlignment 32
deviceOverlap 1
multiProcessorCount 24
kernelExecTimeoutEnabled 0
integrated 0
canMapHostMemory 1
computeMode 0
maxTexture1D 65536
maxTexture1DMipmap 16384
maxTexture1DLinear 134217728
maxTextureCubemap 16384
maxSurface1D 16384
maxSurfaceCubemap 16384
surfaceAlignment 512
concurrentKernels 1
ECCEnabled 0
pciBusID 131
pciDeviceID 0
pciDomainID 0
tccDriver 0
asyncEngineCount 2
unifiedAddressing 1
memoryClockRate 3505000
memoryBusWidth 384
l2CacheSize 3145728
maxThreadsPerMultiProcessor 2048
streamPrioritiesSupported 1
globalL1CacheSupported 1
localL1CacheSupported 1
sharedMemPerMultiprocessor 98304
regsPerMultiprocessor 65536
managedMemSupported 1
isMultiGpuBoard 0
multiGpuBoardGroupID 3
Total memory (NVML) 12206 MB
Used memory (NVML) 3 MB
Free memory (NVML) 12203 MB
GPU utilization (NVML) 0%
Memory utilization (NVML) 0%
Temperature (NVML) 33 C
ii caffe-nv 0.14.5-2+cuda7.5 amd64 Fast open framework for Deep Learning
ii caffe-nv-tools 0.14.5-2+cuda7.5 amd64 Fast open framework for Deep Learning (Tools)
ii cuda-cudart-7-5 7.5-18 amd64 CUDA Runtime native Libraries
ii digits 3.0.0-1 amd64 NVIDIA DIGITS webserver
ii libcaffe-nv0 0.14.5-2+cuda7.5 amd64 Fast open framework for Deep Learning (Libs)
ii libcudnn4 4.0.7 amd64 cuDNN runtime libraries
ii libcudnn4-dev 4.0.7 amd64 cuDNN development libraries and headers
ii libcudnn5 5.0.6-1+cuda7.5 amd64 cuDNN runtime libraries
ii libnccl1 1.2.1-1+cuda7.5 amd64 NVIDIA Communication Collectives Library (NCCL) Runtime
ii nvidia-machine-learning-repo 4.0-2 amd64 NVIDIA Deep Learning Packages
ii python-caffe-nv 0.14.5-2+cuda7.5 amd64 Fast open framework for Deep Learning (Python)
ii torch7-nv 0.9.98-1+cuda7.5 amd64 NVidia Torch Bundle (with CUDA). Made for DIGITS.
The CUDA-Tootkits may be good! because we usually use the GPUs to accelerate our training of the net.it does well! I'm sorry to return your letter later. Because the git did not push the information about the answer that i think it may do it at first. Hope to your return ,thanks!
You've got a CUDA 8.0 RC driver - you might want to try updating to a proper release driver? https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements
Also, why not upgrade to DIGITS 4 at least? DIGITS 3 is pretty old. https://github.com/NVIDIA/DIGITS/releases/tag/v3.0.0
Updating to the Digits 4 can solve my problem?Does it have a influence for the training of the net with the GPUs? and how to do it? thanks for a lot! PS:i cannot update the driver,because there are many people use it in my team。
You already have access to the 3.0 debs, so I expect you also have access to the 4.0 debs.
sudo apt-get update
sudo apt-get upgrade
I restarted the nvidia-digits-server ,but it didnot work .i cat the digits.log.it follows:
I am a new bird! Can you help me solve this problem? thanks