python MI_Proposed_CNNs_Architecture.py 执行错误

Traceback (most recent call last): File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4 [[{{node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "MI_Proposed_CNNs_Architecture.py", line 582, in sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.50}) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4 [[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

Caused by op 'Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul', defined at: File "MI_Proposed_CNNs_Architecture.py", line 301, in train_step = tf.train.AdamOptimizer(1e-5).minimize(loss) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize grad_loss=grad_loss) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients gate_gradients, aggregation_method, stop_gradients) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper lambda: grad_fn(op, out_grads)) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile return grad_fn() # Exit early File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in lambda: grad_fn(op, out_grads)) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py", line 1130, in _MatMulGrad grad_a = gen_math_ops.mat_mul(grad, b, transpose_b=True) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul name=name) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

...which was originally created as op 'Output_Layer/prediction/MatMul', defined at: File "MI_Proposed_CNNs_Architecture.py", line 290, in prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2057, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul name=name) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4 [[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

作者您好,我是一名大三的学生，最近正在复现您的论文寻找灵感。但是当我运行(Under Python 3.6 Environment) $ python MI_Proposed_CNNs_Architecture.py 时候遇到了以上错误,我查询了很多资料都没有结果，目前最大的可能性是tensorflow版本和cuda版本不匹配,但是我不确定这是否正确。

我的电脑配置如下： NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 NVIDIA GeForce RTX 4090 显存24G

conda 环境如下: 这个conda 环境是运行在 python 3.6.13 下

absl-py 0.15.0
astor 0.8.1
certifi 2021.5.30
coverage 5.5
Cython 0.29.24
dataclasses 0.8
et-xmlfile 1.1.0
gast 0.5.3
grpcio 1.36.1
h5py 2.10.0
importlib-metadata 4.8.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
mkl-fft 1.3.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
openpyxl 3.1.2
pandas 1.1.5
pip 20.0.2
protobuf 3.17.2
python-dateutil 2.9.0.post0 pytz 2024.1
scipy 1.5.2
setuptools 36.4.0
six 1.16.0
tensorboard 1.12.2
tensorflow 1.12.0
termcolor 1.1.0
typing-extensions 4.1.1
Werkzeug 2.0.3
wheel 0.37.1
xlrd 1.2.0
zipp 3.6.0

当我运行将batch_size 调整到 64以上的时候遇到是那个错误

但是当我将batch_size大小调整到16的时候我遇到的是一个关于显存不足的错误: Traceback (most recent call last): File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node Convolutional_1/h_conv1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "MI_Proposed_CNNs_Architecture.py", line 584, in train_acc, train_loss = sess.run([Global_Average_Accuracy, loss], feed_dict={x: train_data, y: train_labels, keep_prob: 1.0}) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Convolutional_1/h_conv1/Conv2D (defined at MI_Proposed_CNNs_Architecture.py:101) = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'Convolutional_1/h_conv1/Conv2D', defined at: File "MI_Proposed_CNNs_Architecture.py", line 101, in h_conv1 = tf.nn.conv2d(x_Reshape, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1 File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d data_format=data_format, dilations=dilations, name=name) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node Convolutional_1/h_conv1/Conv2D (defined at MI_Proposed_CNNs_Architecture.py:101) = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info

这或者也能成为您指导我的线索,由于我通过matlab仅仅生成了Excel .xlsx Files文件,所以我将您原本的 .csv 文件全部换成了.xlsx文件进行读取,但是我感觉这个无法构成我错误的原因,但如果这能启发您请给我回复

其实还有一个可能的错误,您使用的是windows环境,而我使用的unbuntu16.0.4 ,请问这个是否也可能是我错误的原因呢?

您好,在我更换了一个更大的显卡,Tesla V100-32GB 环境是cuda 12.2 ,python 3.6.3 tensorflow-gpu=1.13.1 解决了以上的两个错误,可以判断是因为显存大小的问题但是当我解决了以上两个问题的时候我遇到了另外一个问题 bfd173ac3707252d5f0dd3feec73ade 我搜索到的结果的是输入的input太小，做卷积运算的时候input会越来越小，过于小会“无法卷”，报错。目前的解决方案估计是更换神经网络每层的大小，请问这是否意味着一开始的层数和每层的输入和输出大小有误。

遇到的超过显存的问题可能是因为在源代码中，分别将整个训练集和测试集用于计算准确率和损失

train_acc, train_loss = sess.run([Global_Average_Accuracy, loss], feed_dict={x: train_data, y: train_labels, keep_prob: 1.0})

test_summary, test_acc, test_loss = sess.run([merged, Global_Average_Accuracy, loss], feed_dict={x: test_data, y: test_labels, keep_prob: 1.0}) 如果分批次计算准确率和损失则不会报错。

SuperBruceJia / EEG-Motor-Imagery-Classification-CNNs-TensorFlow

python MI_Proposed_CNNs_Architecture.py 执行错误 #10