Closed wangzeyu135798 closed 3 years ago
Dear Wang,
Are you running it with or without bone information? Which dataset are you using?
Chiara
Hi: I use Kinetics-skeleton data. I use Kinetics_gendata.py directly generate data we need.
I use Kinetics-skeleton data, may be without bone.
------------------ 原始邮件 ------------------ 发件人: "Chiaraplizz/ST-TR" <notifications@github.com>; 发送时间: 2021年1月4日(星期一) 凌晨5:49 收件人: "Chiaraplizz/ST-TR"<ST-TR@noreply.github.com>; 抄送: "天津工业大学王泽宇"<1051533398@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [Chiaraplizz/ST-TR] Code can't run (#4)
Dear Wang,
Are you running it with or without bone information? Which dataset are you using?
Chiara
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Can you send me either here or via e-mail you train.yaml?
Chiara
In main.py line 548, when i run this code, anothor problem occur: RuntineError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.Specify retain_graph=True when calling backward the first time. I'm confused. In your whole train process, there is only one forward and the correspoding backward().Why does this problem occur?
How to add the key points of hand and body to the data set and generate bone data
I directly run kinetics_gendata.py to generate data.
Hi: When I run python main.py, there is a problem, Traceback (most recent call last): File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 956, in processor.start() File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 867, in start self.train(epoch, save_model=save_model) File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 538, in train output = self.model(data, label, name) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/data/wangzeyu/code/ST-TR-master/code/st_gcn/net/st_gcn.py", line 267, in forward x = self.data_bn(x) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward self.weight, self.bias, bn_training, exponential_average_factor, self.eps) File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: running_mean should contain 108 elements not 216
How to solve it? Thanks
This problem occurs when your model expects a number of channels different from the ones which are fed to the model, so maybe there is a mismatch between the number of channels instantiated by the model and the data you are using. For example, if you set channels: 6
be sure to pass the data with joint+bone information.
Chiara
In main.py line 548, when i run this code, anothor problem occur: RuntineError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.Specify retain_graph=True when calling backward the first time. I'm confused. In your whole train process, there is only one forward and the correspoding backward().Why does this problem occur?
Are you using the actual version of the code? I have the accuracy computation at that line. Can you copy here the line of code at which the error happens?
Chiara
How to add the key points of hand and body to the data set and generate bone data
Please use ntu_gen_bones.py
and ntu_merge_joint_bones.py
to generate bones data from joint information, and to merge them together by concatenation on channel dimension.
Chiara
In main.py line 548, when i run this code, anothor problem occur: RuntineError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed.Specify retain_graph=True when calling backward the first time. I'm confused. In your whole train process, there is only one forward and the correspoding backward().Why does this problem occur?
Are you using the actual version of the code? I have the accuracy computation at that line. Can you copy here the line of code at which the error happens?
Chiara
I had the same problem when I used single GPU to run main.py. But no problem when running main.py with multiple GPUs.
Traceback (most recent call last):
File "main.py", line 965, in
I also had the same problem. Is there any way to run it on a single GPU?
Traceback (most recent call last):
File "main.py", line 975, in
在main.py第548行中,当我运行此代码时,出现 另一个问题:RuntineError:试图第二次向后浏览该图,但已保存的中间结果已被释放。第一次调用向后时,请指定keep_graph = True。 我很困惑。在整个训练过程中,只有一个前进,而相应的是向后()。为什么会出现此问题?
I meet the same quesition
只要把程序中所有的 += 或者 = 分别写成 a = a + b a = a b的形式就可以避免这个问题。我之前是这么解决的,希望能够帮到你
------------------ 原始邮件 ------------------ 发件人: "Chiaraplizz/ST-TR" @.>; 发送时间: 2021年3月25日(星期四) 下午3:40 @.>; @.**@.>; 主题: Re: [Chiaraplizz/ST-TR] Code can't run (#4)
在main.py第548行中,当我运行此代码时,出现 另一个问题:RuntineError:试图第二次向后浏览该图,但已保存的中间结果已被释放。第一次调用向后时,请指定keep_graph = True。 我很困惑。在整个训练过程中,只有一个前进,而相应的是向后()。为什么会出现此问题?
I meet the same quesition
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
只要把程序中所有的 += 或者 = 分别写成 a = a + b a = a b的形式就可以避免这个问题。我之前是这么解决的,希望能够帮到你 … ------------------ 原始邮件 ------------------ 发件人: "Chiaraplizz/ST-TR" @.>; 发送时间: 2021年3月25日(星期四) 下午3:40 @.>; @.**@.>; 主题: Re: [Chiaraplizz/ST-TR] Code can't run (#4) 在main.py第548行中,当我运行此代码时,出现 另一个问题:RuntineError:试图第二次向后浏览该图,但已保存的中间结果已被释放。第一次调用向后时,请指定keep_graph = True。 我很困惑。在整个训练过程中,只有一个前进,而相应的是向后()。为什么会出现此问题? I meet the same quesition — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. It works ! Thank you.How can I get in touch with you?
只要把程序中所有的 += 或者 = 分别写成 a = a + b a = a b的形式就可以避免这个问题。我之前是这么解决的,希望能够帮到你 …
Thanks for this workaround!
Traceback (most recent call last): File "main.py", line 965, in processor.start() File "main.py", line 876, in start self.train(epoch, save_model=save_model) File "main.py", line 547, in train loss_norm.backward(retain_graph=True) File "/home/dusko/anaconda3/envs/apbgcn/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dusko/anaconda3/envs/apbgcn/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 25, 25]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I also had the same problem. Is there any way to run it on a single GPU?
Did you solve this issue? sorry about the late question :)
Hi: When I run python main.py, there is a problem, Traceback (most recent call last): File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 956, in
processor.start()
File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 867, in start
self.train(epoch, save_model=save_model)
File "/data/wangzeyu/code/ST-TR-master/code/main.py", line 538, in train
output = self.model(data, label, name)
File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, kwargs)
File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
return self.module(*inputs[0], *kwargs[0])
File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "/data/wangzeyu/code/ST-TR-master/code/st_gcn/net/st_gcn.py", line 267, in forward
x = self.data_bn(x)
File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
File "/data/wangzeyu/wangzeyu_torch_clone/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: running_mean should contain 108 elements not 216
How to solve it? Thanks