GPU->CPU Memcpy failed - Githubissues

talmazov commented 5 years ago

Hey everyone, I am trying to run a training on 2 CBCT segmented volumes but I am running into an issue of GPU->CPU Memcpy failed. I was reading around in the tensorflow repo that it is an issue of insufficient memory so i reduced the size of the volumes drastically, as well as the batch_size and queue_length however I still get this error. I have included the configuration file here. My PC runs on 32 GB of RAM and GeForce RTX2060 w/ 6GB VRAM. NiftyNet and TensorFlow recognize the device. I have CUDA 10 installed on the nvidia driver 418 w/ cuDNN 7.6.3 When i run in the python console sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) the GPU appears and everything is fine

Most notably I see "could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" then when niftynet performs a Parameters from random initialisations where the shuffle buffer is filled, it gives the GPU->CPU memcpy failed error 2019-08-24 17:21:52.622248: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.

2019-08-24 17:22:00.194936: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-24 17:22:00.370394: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.383907: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.394481: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-08-24 17:22:00.394513: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support
2019-08-24 17:22:00.394571: I tensorflow/stream_executor/stream.cc:1865] [stream=0x461c2b0,impl=0x4908880] did not wait for [stream=0x474a8d0,impl=0x48faf40]
2019-08-24 17:22:00.394578: I tensorflow/stream_executor/stream.cc:4800] [stream=0x461c2b0,impl=0x4908880] did not memcpy device-to-host; source: 0x7effad60b300
2019-08-24 17:22:00.394652: F tensorflow/core/common_runtime/gpu/gpu_util.cc:293] GPU->CPU Memcpy failed

I do not get this issue when performing training on the CPU.

I switched the networks from dense_vnet to highres3dnet and at this point I only get the "Could not create cudnn handle" error. I modified the util_common.py tf_config() method to include config.gpu_options.allow_growth = True as described in https://github.com/tensorflow/tensorflow/issues/24496 but that does not seem to address the issue.

Any thoughts? Is this an issue of not enough VRAM?

the command i use to run is python3 net_segment.py train -c ~/mandible_segmentation/config.ini

My configuration is

############################ input configuration sections
[cbct]
path_to_search = /home/mayotic/mandible_segmentation/CBCT_TRAINING/
filename_contains = _cbct
interp_order = 1
spatial_window_size = (120,120,200)
axcodes=(A, R, S)

[label]
path_to_search = /home/mayotic/mandible_segmentation/CBCT_TRAINING/
filename_contains = _label
interp_order = 0
spatial_window_size = (120,120,200)
axcodes=(A, R, S)

############################## system configuration sections
[SYSTEM]
cuda_devices = 0
num_threads = 1
num_gpus = 1
model_dir = /home/mayotic/mandible_segmentation/
queue_length = 36

[NETWORK]
name = dense_vnet
batch_size = 6

# volume level preprocessing
volume_padding_size = 0
window_sampling = uniform

[TRAINING]
sample_per_volume = 1
lr = 0.001
starting_iter = 0
save_every_n = 1000
max_iter = 3001
tensorboard_every_n = 1

[INFERENCE]
border = (0, 0, 0)
inference_iter = 3000
output_interp_order = 0
spatial_window_size = (120,120,200)
save_seg_dir = /home/mayotic/mandible_segmentation/segmentation_output/

############################ custom configuration sections
[SEGMENTATION]
image = cbct
label = label
label_normalisation = False
output_prob = False
num_classes = 2

running the following code from within python3 cli works just fine

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

IsaacLord commented 4 years ago

Did you find any solution for this issue?

talmazov commented 4 years ago

Did you find any solution for this issue?

no solution. I suspect this is an issue with the underlying code not being able to copy data between system RAM and VRAM but im not sure why - i see the bug was assigned to someone. for now either drastically decrease the DICOM resolution or increase you graphics card's available VRAM. Also to be fair, processing CT/CBCT DICOM even on CPU (which does not produce this error), my PC maxes out 32gigs of RAM easily

IsaacLord commented 4 years ago

Just Change the Tensorflow version! It dose work !

IsaacLord commented 4 years ago

you can follow the following link to solve this issue! https://github.com/NifTK/NiftyNet/issues/447

talmazov commented 4 years ago

Hey, so i tried again, when i run tensorflow GPU for object detection everything runs fine. I have installed numpy 1.16.0 amd tensorflow-gpu 1.13.2 however i still get the GPU-CPU memcpy failed

i am not sure why CUDNN handle could not be created

2019-12-06 22:01:17.891564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-12-06 22:01:17.891610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-06 22:01:17.891614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-12-06 22:01:17.891617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-12-06 22:01:17.891694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4853 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Parameters from random initialisations ...
2019-12-06 22:01:24.491935: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-12-06 22:01:24.939564: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x4ae44b0
2019-12-06 22:01:35.242771: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:101] Filling up shuffle buffer (this may take a while): 12 of 30
2019-12-06 22:01:45.241902: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:101] Filling up shuffle buffer (this may take a while): 24 of 30
2019-12-06 22:01:50.185805: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:140] Shuffle buffer filled.
2019-12-06 22:01:55.576205: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.588900: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.600046: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-06 22:01:55.600087: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads
2019-12-06 22:01:56.241818: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241845: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5b00
2019-12-06 22:01:56.241879: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241921: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5c00
2019-12-06 22:01:56.241933: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241938: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5a00
2019-12-06 22:01:56.241949: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4aeaa00,impl=0x4aeaaa0] did not wait for [stream=0x4aea520,impl=0x4ae44d0]
2019-12-06 22:01:56.241948: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed
2019-12-06 22:01:56.241964: I tensorflow/stream_executor/stream.cc:5014] [stream=0x4aeaa00,impl=0x4aeaaa0] did not memcpy device-to-host; source: 0x7fc710cc5e00
Aborted

talmazov commented 4 years ago

I tried

sudo python3 net_download.py dense_vnet_abdominal_ct_model_zoo
sudo python3 net_segment.py inference -c ~/niftynet/extensions/dense_vnet_abdominal_ct/config.ini

and i get

2019-12-07 11:37:02.795645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4904 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:niftynet: Restoring parameters from /home/mayotic/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000
2019-12-07 11:37:06.140918: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.143425: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.148064: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-12-07 11:37:06.148081: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
INFO:niftynet: cleaning up...
INFO:niftynet: stopping sampling threads

what version of cuDNN is everybody else running?? i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16 using geforce RTX 2060 6GB vram with nvidia driver 440.33.01 tensorflow tries to allocate 5 GB spatial_window_size = (64, 64, 512) with dense_vnet network

is this a common error thrown when the GPU does not have enough physical memory to run training?

NifTK / NiftyNet

GPU->CPU Memcpy failed #436