Closed wangwwwwwwv closed 3 years ago
i know . the" num_class 400" is true. Because the kinetics datasets was 400 classes several years ago .And it's 600 classes now
and the channel is relate to the "double_channel" .that's a stupid question LOL
help!!!here is the new bug
Dear Wang, thank you for your question.
Please set num_class: 400
and window_size: 150
. The reason why it throws the error in the first comment could be that you set channels: 6
but then you pass an input with 3 channels (as I can infer from the shape)?
Please notice that you should set channels: 6
only if you use joint+bone information (3 channels for joint information, 3 channels for bone information)
Dear Wang, thank you for your question.
Please set
num_class: 400
andwindow_size: 150
. The reason why it throws the error in the first comment could be that you setchannels: 6
but then you pass an input with 3 channels (as I can infer from the shape)? Please notice that you should setchannels: 6
only if you use joint+bone information (3 channels for joint information, 3 channels for bone information)
thank you very much!! I figure it out.
hi i set the config/stgcn/kinetics-skeleton/train.yaml like this: windows_size :150 num_class 400 channel:6 num_point:18 num_person:2
and run main.py here is the bug
File "main.py", line 959, in <module> processor.start() File "main.py", line 870, in start self.train(epoch, save_model=save_model) File "main.py", line 453, in train output = self.model(data, label, name) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/lijianwei/st_tr_wsq/ST-TR-master/code/st_gcn/net/st_gcn.py", line 255, in forward x = self.data_bn(x) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 103, in forward return F.batch_norm( File "/home/lijianwei/.pyenv/versions/TTNet/lib/python3.8/site-packages/torch/nn/functional.py", line 1921, in batch_norm return torch.batch_norm( RuntimeError: running_mean should contain 108 elements not 216
then i check the kinetics_train_joint.npy. The shape of that is (240436,3,300,18,2)
next i change the train.yaml like this: windows_size :300 num_class 600 channel:3 num_point:18 num_person:2
and i run the main.py. I could run the train step with no bugs but still error in val step. it might be the num_class error . and i set num_class 400 . it comes the cuda error
A1 = self.soft(torch.matmul(A1, A2) / A1.size(-1)) # N V V RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling
cublasCreate(handle)`` can you help me?