akanazawa / hmr

Project page for End-to-end Recovery of Human Shape and Pose
Other
1.54k stars 395 forks source link

Demo Run Failed With Cudnn Error #107

Closed alon1samuel closed 5 years ago

alon1samuel commented 5 years ago

Hi,

I've tried to make demo run and I got this error:

2019-08-11 18:09:17.244335: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-08-11 18:09:17.464974: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-08-11 18:09:18.180194: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-08-11 18:09:18.213845: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Traceback (most recent call last): File "/media/alon/hdd/Guardian/od/hmr/demo.py", line 151, in main(config.img_path, config.json_path) File "/media/alon/hdd/Guardian/od/hmr/demo.py", line 136, in main input_img, get_theta=True) File "/media/alon/hdd/Guardian/od/hmr/src/RunModel.py", line 140, in predict results = self.predict_dict(images) File "/media/alon/hdd/Guardian/od/hmr/src/RunModel.py", line 166, in predict_dict results = self.sess.run(fetch_dict, feed_dict) File "/media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node Encoder_resnet/resnet_v2_50/conv1/Conv2D (defined at tmp/tmpvc6uDQ.py:12) ]] [[add_2/_573]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node Encoder_resnet/resnet_v2_50/conv1/Conv2D (defined at tmp/tmpvc6uDQ.py:12) ]] 0 successful operations. 0 derived errors ignored.

Errors may have originated from an input operation. Input Source operations connected to node Encoder_resnet/resnet_v2_50/conv1/Conv2D: Encoder_resnet/resnet_v2_50/Pad (defined at media/alon/hdd/Guardian/od/hmr/src/models.py:48)

Input Source operations connected to node Encoder_resnet/resnet_v2_50/conv1/Conv2D: Encoder_resnet/resnet_v2_50/Pad (defined at media/alon/hdd/Guardian/od/hmr/src/models.py:48)

Original stack trace for u'Encoder_resnet/resnet_v2_50/conv1/Conv2D': File "media/alon/hdd/Guardian/od/hmr/demo.py", line 151, in main(config.img_path, config.json_path) File "media/alon/hdd/Guardian/od/hmr/demo.py", line 125, in main model = RunModel(config, sess=sess) File "media/alon/hdd/Guardian/od/hmr/src/RunModel.py", line 62, in init self.build_test_model_ief() File "media/alon/hdd/Guardian/od/hmr/src/RunModel.py", line 82, in build_test_model_ief reuse=False) File "media/alon/hdd/Guardian/od/hmr/src/models.py", line 48, in Encoder_resnet scope='resnet_v2_50') File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py", line 287, in resnet_v2_50 scope=scope) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py", line 214, in resnet_v2 net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1') File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_utils.py", line 146, in conv2d_same scope=scope) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args return func(*args, current_args) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1159, in convolution2d conv_dims=2) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args return func(*args, *current_args) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1057, in convolution outputs = layer.apply(inputs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply return self.call(inputs, args, kwargs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 537, in call outputs = super(Layer, self).call(inputs, *args, kwargs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call outputs = call_fn(inputs, *args, *kwargs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper ), args, kwargs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call result = converted_f(effective_args, kwargs) File "tmp/tmpvc6uDQ.py", line 12, in tfcall outputs = ag.converted_call('_convolution_op', self, ag.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (inputs, self.kernel), None) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call return _call_unconverted(f, args, kwargs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/autograph/impl/api.py", line 255, in _call_unconverted return f(*args) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1079, in call__ return self.conv_op(inp, filter) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 635, in call return self.call(inp, filter) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 234, in call name=self.name) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1953, in conv2d name=name) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1071, in conv2d data_format=data_format, dilations=dilations, name=name) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "media/alon/hdd/Guardian/od/hmr/venv_hmr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

I've make it work with inserting lines according to GPU memory allocation in this issue.

Does anyone has a better solution or is that it?

Regards, Alon

vigneshrk29 commented 3 years ago

Hello @alon1samuel

Did you manage to find a fix for the above error by any chance? Sorry I have been trying to get it working for the last one week but have not been having too much luck.

Thanks, Vignesh

alon1samuel commented 3 years ago

Hi, I don't remember if I fixed it or how. From reading the message that it gave me, my guess is that the main part is "cudnn failed to initialize". I would guess that it's not a problem with this repo, but a problem with any TF model you are running that uses conv2d layers (or similar). So I would suggest to check back the installation again with a simple model like in this tutorial to see it works first. Hope it helps! Alonsh

vigneshrk29 commented 3 years ago

Hi, thanks for your reply. I got it working a similar way to how you did initially by limiting GPU usage. I shall try it on the simple model and try to figure out the problem.

Thanks