Chiaraplizz / ST-TR

Spatial Temporal Transformer Network for Skeleton-Based Activity Recognition
MIT License
299 stars 58 forks source link

Code can't run #4

Closed wangzeyu135798 closed 3 years ago

wangzeyu135798 commented 3 years ago

Hi: When I run python main.py, there is a problem, Traceback (most recent call last): File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 956, in processor.start() File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 867, in start self.train(epoch, save_model=save_model) File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 538, in train output = self.model(data, label, name) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/data/wangzeyu/code/ST-TR-master/code/st_gcn/net/st_gcn.py", line 267, in forward x = self.data_bn(x) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward self.weight, self.bias, bn_training, exponential_average_factor, self.eps) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: running_mean should contain 108 elements not 216

How to solve it? Thanks

Chiaraplizz commented 3 years ago

Dear Wang,

Are you running it with or without bone information? Which dataset are you using?

Chiara

wangzeyu135798 commented 3 years ago

Hi: I use Kinetics-skeleton data. I use Kinetics_gendata.py directly generate data we need.

wangzeyu135798 commented 3 years ago

I use Kinetics-skeleton data, may be without bone.

------------------ 原始邮件 ------------------ 发件人: "Chiaraplizz/ST-TR" <notifications@github.com>; 发送时间: 2021年1月4日(星期一) 凌晨5:49 收件人: "Chiaraplizz/ST-TR"<ST-TR@noreply.github.com>; 抄送: "天津工业大学王泽宇"<1051533398@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [Chiaraplizz/ST-TR] Code can't run (#4)

Dear Wang,

Are you running it with or without bone information? Which dataset are you using?

Chiara

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Chiaraplizz commented 3 years ago

Can you send me either here or via e-mail you train.yaml?

Chiara

wangzeyu135798 commented 3 years ago

In main.py line 548, when i run this code, anothor problem occur: RuntineError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.Specify retain_graph=True when calling backward the first time. I'm confused. In your whole train process, there is only one forward and the correspoding backward().Why does this problem occur?

kkk241-q commented 3 years ago

How to add the key points of hand and body to the data set and generate bone data

wangzeyu135798 commented 3 years ago

I directly run kinetics_gendata.py to generate data.

Chiaraplizz commented 3 years ago

Hi: When I run python main.py, there is a problem, Traceback (most recent call last): File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 956, in processor.start() File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 867, in start self.train(epoch, save_model=save_model) File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 538, in train output = self.model(data, label, name) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/data/wangzeyu/code/ST-TR-master/code/st_gcn/net/st_gcn.py", line 267, in forward x = self.data_bn(x) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward self.weight, self.bias, bn_training, exponential_average_factor, self.eps) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: running_mean should contain 108 elements not 216

How to solve it? Thanks

This problem occurs when your model expects a number of channels different from the ones which are fed to the model, so maybe there is a mismatch between the number of channels instantiated by the model and the data you are using. For example, if you set channels: 6 be sure to pass the data with joint+bone information.

Chiara

Chiaraplizz commented 3 years ago

In main.py line 548, when i run this code, anothor problem occur: RuntineError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.Specify retain_graph=True when calling backward the first time. I'm confused. In your whole train process, there is only one forward and the correspoding backward().Why does this problem occur?

Are you using the actual version of the code? I have the accuracy computation at that line. Can you copy here the line of code at which the error happens?

Chiara

Chiaraplizz commented 3 years ago

How to add the key points of hand and body to the data set and generate bone data

Please use ntu_gen_bones.py and ntu_merge_joint_bones.py to generate bones data from joint information, and to merge them together by concatenation on channel dimension.

Chiara

NeonLix commented 3 years ago

In main.py line 548, when i run this code, anothor problem occur: RuntineError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.Specify retain_graph=True when calling backward the first time. I'm confused. In your whole train process, there is only one forward and the correspoding backward().Why does this problem occur?

Are you using the actual version of the code? I have the accuracy computation at that line. Can you copy here the line of code at which the error happens?

Chiara

I had the same problem when I used single GPU to run main.py. But no problem when running main.py with multiple GPUs.

imj2185 commented 3 years ago

Traceback (most recent call last): File "main.py", line 965, in processor.start() File "main.py", line 876, in start self.train(epoch, save_model=save_model) File "main.py", line 547, in train loss_norm.backward(retain_graph=True) File "/home/dusko/anaconda3/envs/apbgcn/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dusko/anaconda3/envs/apbgcn/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 25, 25]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I also had the same problem. Is there any way to run it on a single GPU?

kkk241-q commented 3 years ago

Traceback (most recent call last): File "main.py", line 975, in processor.start() File "main.py", line 885, in start self.train(epoch, save_model=save_model) File "main.py", line 552, in train loss_norm.backward()# File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

wangpitao commented 3 years ago

在main.py第548行中,当我运行此代码时,出现 另一个问题:RuntineError:试图第二次向后浏览该图,但已保存的中间结果已被释放。第一次调用向后时,请指定keep_graph = True。 我很困惑。在整个训练过程中,只有一个前进,而相应的是向后()。为什么会出现此问题?

I meet the same quesition

wangzeyu135798 commented 3 years ago

只要把程序中所有的 += 或者 = 分别写成 a = a + b   a = a b的形式就可以避免这个问题。我之前是这么解决的,希望能够帮到你

------------------ 原始邮件 ------------------ 发件人: "Chiaraplizz/ST-TR" @.>; 发送时间: 2021年3月25日(星期四) 下午3:40 @.>; @.**@.>; 主题: Re: [Chiaraplizz/ST-TR] Code can't run (#4)

在main.py第548行中,当我运行此代码时,出现 另一个问题:RuntineError:试图第二次向后浏览该图,但已保存的中间结果已被释放。第一次调用向后时,请指定keep_graph = True。 我很困惑。在整个训练过程中,只有一个前进,而相应的是向后()。为什么会出现此问题?

I meet the same quesition

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

wangpitao commented 3 years ago

只要把程序中所有的 += 或者 = 分别写成 a = a + b   a = a b的形式就可以避免这个问题。我之前是这么解决的,希望能够帮到你 ------------------ 原始邮件 ------------------ 发件人: "Chiaraplizz/ST-TR" @.>; 发送时间: 2021年3月25日(星期四) 下午3:40 @.>; @.**@.>; 主题: Re: [Chiaraplizz/ST-TR] Code can't run (#4) 在main.py第548行中,当我运行此代码时,出现 另一个问题:RuntineError:试图第二次向后浏览该图,但已保存的中间结果已被释放。第一次调用向后时,请指定keep_graph = True。 我很困惑。在整个训练过程中,只有一个前进,而相应的是向后()。为什么会出现此问题? I meet the same quesition — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. It works ! Thank you.How can I get in touch with you?

Chiaraplizz commented 3 years ago

只要把程序中所有的 += 或者 = 分别写成 a = a + b   a = a b的形式就可以避免这个问题。我之前是这么解决的,希望能够帮到你

Thanks for this workaround!

sankadivandya commented 2 years ago

Traceback (most recent call last): File "main.py", line 965, in processor.start() File "main.py", line 876, in start self.train(epoch, save_model=save_model) File "main.py", line 547, in train loss_norm.backward(retain_graph=True) File "/home/dusko/anaconda3/envs/apbgcn/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dusko/anaconda3/envs/apbgcn/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 25, 25]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I also had the same problem. Is there any way to run it on a single GPU?

Did you solve this issue? sorry about the late question :)