AbnerHqC / GaitSet

A flexible, effective and fast cross-view gait recognition network
582 stars 170 forks source link

train.py 执行后无反应 RuntimeError: CuDNN error: CUDNN_STATUS_EXECUTION_FAILED #170

Open miomiora opened 11 months ago

miomiora commented 11 months ago
Initialzing...
Initializing data source...
Data initialization complete.
Initializing model...
Model initialization complete.
Training START
Traceback (most recent call last):
  File "train.py", line 21, in <module>
    m.fit()
  File "/home/czk/code/GaitSet/model/model.py", line 159, in fit
    feature, label_prob = self.encoder(*seq, batch_frame)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/code/GaitSet/model/network/gaitset.py", line 90, in forward
    x = self.set_layer1(x)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/code/GaitSet/model/network/basic_blocks.py", line 24, in forward
    x = self.forward_block(x.view(-1,c,h,w))
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/code/GaitSet/model/network/basic_blocks.py", line 11, in forward
    x = self.conv(x)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/czk/anaconda3/envs/GaitSet/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CuDNN error: CUDNN_STATUS_EXECUTION_FAILED

执行 python train.py 之后就无反应了,等很久之后会报这个错

pip list
Package         Version
--------------- ------------
certifi         2021.5.30
cffi            1.14.6
imageio         2.15.0
mkl-fft         1.0.6
mkl-random      1.0.1
numpy           1.15.4
opencv-python   4.1.2.30
pandas          1.1.5
Pillow          8.4.0
pip             21.2.2
pycparser       2.21
python-dateutil 2.8.2
pytz            2023.3.post1
scipy           1.5.4
setuptools      58.0.4
six             1.16.0
TBB             0.2
torch           0.4.1
wheel           0.37.1
xarray          0.16.2

执行 python train.py 过程中 GPU 只会有一点占用