HasnainRaz / FC-DenseNet-TensorFlow

Fully Convolutional DenseNet (A.K.A 100 layer tiramisu) for semantic segmentation of images implemented in TensorFlow.
MIT License
123 stars 41 forks source link

Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED and Possibly insufficient driver version: 384.81.0 Segmentation fault (core dumped) #11

Closed sanersbug closed 5 years ago

sanersbug commented 5 years ago

when i run the order : python main.py --mode=train --train_data=/mnt/saners-extend/FC-DenseNet-TensorFlow/data/train --val_data=/mnt/saners-extend/FC-DenseNet-TensorFlow/data/val --layers_per_block=4,5,7,10,12,15 --batch_size=2 --epochs=10 --growth_k=16 --num_classes=2 --learning_rate=0.001

it shows that: /home/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters First Convolution Out: (?, 256, 256, 48) Downsample Out: (?, 128, 128, 112) Downsample Out: (?, 64, 64, 192) Downsample Out: (?, 32, 32, 304) Downsample Out: (?, 16, 16, 464) Downsample Out: (?, 8, 8, 656) Bottleneck Block: (?, 8, 8, 240) Upsample after concat: (?, 16, 16, 896) Upsample after concat: (?, 32, 32, 704) Upsample after concat: (?, 64, 64, 496) Upsample after concat: (?, 128, 128, 352) Upsample after concat: (?, 256, 256, 224) Mask Prediction: (?, 256, 256, 2) 2018-10-25 15:54:14.641467: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-10-25 15:54:14.781306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-10-25 15:54:14.781714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:05:01.0 totalMemory: 11.91GiB freeMemory: 2.58GiB 2018-10-25 15:54:14.781753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2018-10-25 15:54:15.190540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-25 15:54:15.190615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2018-10-25 15:54:15.190628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2018-10-25 15:54:15.190885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2277 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:05:01.0, compute capability: 6.0) 2018-10-25 15:55:52.442988: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2018-10-25 15:55:52.443130: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 384.81.0 Segmentation fault (core dumped)

my environment is cuda9.0, cudnn7.0,tensorflow 1.10.1,anyone can give some advice ? thanks very much

HasnainRaz commented 5 years ago

You have already a process running on the GPU:

totalMemory: 11.91GiB freeMemory: 2.58GiB

^ above shows your gpu memory is occupied by another process, please use nvidia-smi to check and terminate the process and try again. Closing because this is unrelated to the repository.