1adrianb / face-alignment

:fire: 2D and 3D Face alignment library build using pytorch
https://www.adrianbulat.com
BSD 3-Clause "New" or "Revised" License
6.94k stars 1.33k forks source link

RuntimeError: Error while calling cudnnCreate(&handles[new_device_id]) ... code: 4, reason: A call to cuDNN failed #184

Closed nwatab closed 4 years ago

nwatab commented 4 years ago

Hi, everyone. I got an error below when training my own tensorflow graph using face_alignment to predict input and output image's landmarks with dlib+CUDA. Does anybody know something about this issue related to dlib?

When training tensorflow graph,

When predicting single image,

I found similar issue on dlib, and it has not beed resolved and already closed. Furthermore, error message "A call to cuDNN failed" is the default message of switch case. https://github.com/davisking/dlib/issues/1772

My environments:

...
    preds = fa.get_landmarks_from_image(img)
  File "/home/unsupervised_reconstruct/.venv/lib/python3.5/site-packages/face_alignment/api.py", line 171, in get_landmarks_from_image
    detected_faces = self.face_detector.detect_from_image(image[..., ::-1].copy())
  File "/home/unsupervised_reconstruct/.venv/lib/python3.5/site-packages/face_alignment/detection/dlib/dlib_detector.py", line 48, in detect_from_image
    detected_faces = self.face_detector(cv2.cvtColor(image, cv2.COLOR_BGR2GRAY))
RuntimeError: Error while calling cudnnCreate(&handles[new_device_id]) in file /home/temp/dlib/dlib/cuda/cudnn_dlibapi.cpp:104. code: 4, reason: A call to cuDNN failed

Thank you for reading and feel sorry that I can't share all training code.

nwatab commented 4 years ago

After investigation, I found variable initialization causing a problem. I keep searching for fix.

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
nwatab commented 4 years ago

Solved by setting up allow_growth = True

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

I guess, tensorflow uses full of GPU memory by default, and prohibit face alignment to use GPU. https://github.com/tensorflow/tensorflow/issues/24828#issuecomment-464957482