A problem about error: "inplace operation"

Chiaraplizz / ST-TR

Spatial Temporal Transformer Network for Skeleton-Based Activity Recognition

MIT License

294 stars 57 forks source link

A problem about error: "inplace operation" #36

Open Goldfish0106 opened 1 year ago

Goldfish0106 commented 1 year ago

Hi, Chiaraplizz, I'd like to consult a problem encountered in running the code. When I start training process, following error has occured:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 75, 25]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

My device config is set as [0, 1, 2, 3] and use dataParallel for multi-processor calculating, which is the same as your configuration in the responsitory. Did youmeet the same issue? I can't appreciate more if you can help me solve this problem.

lywinaaa commented 1 year ago

Me too, is there a way?

Kuroshika commented 1 year ago

Hi, Iywinaaa, I faced the same problem before. I fixed it by changing the pytorch to a lower version. Hope it is helpful.

lywinaaa commented 1 year ago

好的，非常感谢！

Kuroshika @.***> 于2023年4月28日周五 10:23写道：

Hi, Iywinaaa, I faced the same problem before. I fixed it by changing the pytorch to a lower version. Hope it is helpful.

— Reply to this email directly, view it on GitHub https://github.com/Chiaraplizz/ST-TR/issues/36#issuecomment-1526887602, or unsubscribe https://github.com/notifications/unsubscribe-auth/A42XPTPR6JJCB7WB6JSZ3YTXDMSYZANCNFSM6AAAAAASY76MME . You are receiving this because you commented.Message ID: @.***>

xiegedaimazhenfeijin commented 1 year ago

Hi, Iywinaaa, I faced the same problem before. I fixed it by changing the pytorch to a lower version. Hope it is helpful.

Hello, may I ask which version of Python did you downgrade to without reporting errors?

Kuroshika commented 1 year ago

My pytorch version is 1.5.1 and the cudatookit version is 10.1, while my torchvision version is 0.6.1. I will be glad if this may help you! 我用的pytorch版本是1.5.1，cudatookit是10.1，torchvision版本是0.6.1，不太建议在30系显卡上运行，因为我没在3090上复现成功:(，但是在一台titan v的电脑上复现成功了。

AnainaM commented 1 year ago

Hi, Chiaraplizz, I'd like to consult a problem encountered in running the code. When I start training process, following error has occured:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 75, 25]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

My device config is set as [0, 1, 2, 3] and use dataParallel for multi-processor calculating, which is the same as your configuration in the responsitory. Did youmeet the same issue? I can't appreciate more if you can help me solve this problem.

Hello can you tell me how you solved this issue?

olayinkaajayi commented 1 year ago

Hello @AnainaM have you been able to resolve this issue? I recently came across this issue and my cuda version is 11.6

A response would be helpful.

AnainaM commented 1 year ago

@olayinkaajayi Actually I was not able to fix this till now. If you get the solution please share it with me as well. Thank you.

olayinkaajayi commented 1 year ago

All right then, I'll be happy to share if I figure it out. By the way are you doing a PhD research related to skeleton-based action recognition?