RuntimeError: Error while calling cudnnCreate(&handles[new_device_id]) ... code: 4, reason: A call to cuDNN failed

nwatab commented 4 years ago

Hi, everyone. I got an error below when training my own tensorflow graph using face_alignment to predict input and output image's landmarks with dlib+CUDA. Does anybody know something about this issue related to dlib?

When training tensorflow graph,

dlib+cuda: below error (nvidia-smi shows up to 5% GPU usage)
sfd+cuda: memory error
dlib+cpu: fine
sfd+cpu: fine

When predicting single image,

dlib+cuda: fine
sfd+cuda: fine

I found similar issue on dlib, and it has not beed resolved and already closed. Furthermore, error message "A call to cuDNN failed" is the default message of switch case. https://github.com/davisking/dlib/issues/1772

My environments:

tensorflow-gpu==1.12.3 (built from source against cuda9.2)
Ubuntu 16
Python 3.5
dlib (+Cuda built from source against cuda9.2)
Cuda 9.2.148
cuDNN 7_7.6.4.38
GTX 1080

...
    preds = fa.get_landmarks_from_image(img)
  File "/home/unsupervised_reconstruct/.venv/lib/python3.5/site-packages/face_alignment/api.py", line 171, in get_landmarks_from_image
    detected_faces = self.face_detector.detect_from_image(image[..., ::-1].copy())
  File "/home/unsupervised_reconstruct/.venv/lib/python3.5/site-packages/face_alignment/detection/dlib/dlib_detector.py", line 48, in detect_from_image
    detected_faces = self.face_detector(cv2.cvtColor(image, cv2.COLOR_BGR2GRAY))
RuntimeError: Error while calling cudnnCreate(&handles[new_device_id]) in file /home/temp/dlib/dlib/cuda/cudnn_dlibapi.cpp:104. code: 4, reason: A call to cuDNN failed

Thank you for reading and feel sorry that I can't share all training code.

nwatab commented 4 years ago

After investigation, I found variable initialization causing a problem. I keep searching for fix.

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

nwatab commented 4 years ago

Solved by setting up allow_growth = True

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

I guess, tensorflow uses full of GPU memory by default, and prohibit face alignment to use GPU. https://github.com/tensorflow/tensorflow/issues/24828#issuecomment-464957482

1adrianb / face-alignment

RuntimeError: Error while calling cudnnCreate(&handles[new_device_id]) ... code: 4, reason: A call to cuDNN failed #184