Closed Ifeng96 closed 5 years ago
I didn't encounter this problem.
The default software version I used is tf==1.3 and python3.5.
yep, I have met the same error. Could you please tell me which version of cuda and cudnn you used?
There may be the problem with GPU. The error appears with 2080Ti. I often met some confused problems using 2080Ti. But it's right using 1080Ti. (cuda 8.0 + cudnn 6.0)
Thanks for the feedback. Actually, this code is only tested on 1080Ti. When I wrote the code, 2080Ti hadn't released.
For 2080Ti GPU card, the recommendation is to install tf1.13+ with CUDA10 and cudnn7.
OK. I will have a try. Thank you.
I have run successfully on GPU 2080G Just make sure your environment is cuda8.0 + cudnn6.0
@LeslieChen233 Thanks. The problem occurs because 2080Ti only support CUDA10.0+. And for tf1.13+, it is compiled under CUDA10 and cudnn7.4.
Using python pip, the install command is just like (need CUDA10 and cudnn7.4):
pip3 install tensorflow-gpu==1.13.0rc1 --user
Have you ever met this error. It appears when I run the code. train epoc:0: 0%| | 0/100 [00:00<?, ?it/s]2019-04-08 19:45:32.987229: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x1564c380 2019-04-08 19:45:33.117242: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at matrix_inverse_op.cc:223 : Internal: tensorflow/core/kernels/cuda_solvers.cc:408: cuSolverDN call failed with status =6
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: tensorflow/core/kernels/cuda_solvers.cc:408: cuSolverDN call failed with status =6 [[Node: MatrixInverse = MatrixInverseT=DT_FLOAT, adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: Mean_2/_51 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2112_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/media/hlf/51aa617b-dbbc-4ad1-af81-45cf8dfce172/hlf/code/TPN-master/train.py", line 213, in
_, summaries, step, ls, ac = sess.run([train_op, train_summary_op, global_step, ce_loss, acc], feed_dict={m.x: support, m.ys:s_labels, m.q: query, m.y:q_labels, m.phase:1})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: tensorflow/core/kernels/cuda_solvers.cc:408: cuSolverDN call failed with status =6
[[Node: MatrixInverse = MatrixInverseT=DT_FLOAT, adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[Node: Mean_2/_51 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2112_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'MatrixInverse', defined at: File "/media/hlf/51aa617b-dbbc-4ad1-af81-45cf8dfce172/hlf/code/TPN-master/train.py", line 148, in
ce_loss,acc,sigma_value = m.construct()
File "/media/hlf/51aa617b-dbbc-4ad1-af81-45cf8dfce172/hlf/code/TPN-master/models.py", line 88, in construct
ce_loss, acc, sigma_value = self.label_prop(emb_x, emb_q, ys_one_hot)
File "/media/hlf/51aa617b-dbbc-4ad1-af81-45cf8dfce172/hlf/code/TPN-master/models.py", line 139, in label_prop
F = tf.matrix_inverse(tf.cast(F, tf.float32))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_linalg_ops.py", line 1049, in matrix_inverse
"MatrixInverse", input=input, adjoint=adjoint, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): tensorflow/core/kernels/cuda_solvers.cc:408: cuSolverDN call failed with status =6 [[Node: MatrixInverse = MatrixInverseT=DT_FLOAT, adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: Mean_2/_51 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2112_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]