Open 223d opened 4 months ago
Hi ,
During the execution evaluation of ./script_action_recognition.sh ntu60_xsub ntu60 cross_subject, an error occurred
loading './checkpoints/ntu60_xsub/checkpoint_0450.pth.tar' for sanity check Traceback (most recent call last): File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 470, in main() File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 128, in main main_worker(0, ngpus_per_node, args) File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 260, in main_worker sanity_check_encoder(model.state_dict(), args.pretrained) File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 396, in sanity_check_encoder assert ((state_dict[k].cpu() == state_dict_pre[k_pre]).all()), KeyError: 'module.backbone.j_emb.t_embedding.0.weight'
Thankyou!
May I inquire if you have employed single GPU training? If so, the saved model parameters will not have a 'module' prefix, as the code defaults to multi GPU training. I suspect this is the reason for the issue. If the problem persists, please contact me. @223d
Hi , During the execution evaluation of ./script_action_recognition.sh ntu60_xsub ntu60 cross_subject, an error occurred loading './checkpoints/ntu60_xsub/checkpoint_0450.pth.tar' for sanity check Traceback (most recent call last): File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 470, in main() File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 128, in main main_worker(0, ngpus_per_node, args) File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 260, in main_worker sanity_check_encoder(model.state_dict(), args.pretrained) File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 396, in sanity_check_encoder assert ((state_dict[k].cpu() == state_dict_pre[k_pre]).all()), KeyError: 'module.backbone.j_emb.t_embedding.0.weight' Thankyou!
May I inquire if you have employed single GPU training? If so, the saved model parameters will not have a 'module' prefix, as the code defaults to multi GPU training. I suspect this is the reason for the issue. If the problem persists, please contact me. @223d
Yes, due to my computer, it can only run on a single GPU. How should I solve this problem? Thankyou!
If the model was not trained using DistributedDataParallel, the ‘module’ prefix is not required during sanity check. Please change k_pre = 'module.' + k
to k_pre = k
accordingly. @223d
If the model was not trained using DistributedDataParallel, the ‘module’ prefix is not required during sanity check. Please change
k_pre = 'module.' + k
tok_pre = k
accordingly. @223d
Hi, May I ask if this issue is caused by an error in checkpoint file data when I run on a single GPU?
a new problem has arisen:
=> loading './checkpoints/ntu60_xsub/checkpoint_0450.pth.tar' for sanity check
Traceback (most recent call last):
File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 470, in
May I ask if you have modified any part of the code? It appears that the issue may be due to incorrect "requires_grad" attributes of some parameters. If you wish to run it on a single GPU, I recommend not changing any code and maintaining DDP training. You only need to modify the command in the pretrain script to CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 pretrain.py
.
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 pretrain.py
Okay, thank you. I did make the code changes, so I'll retrain it.
Hi ,
During the execution evaluation of ./script_action_recognition.sh ntu60_xsub ntu60 cross_subject, an error occurred
loading './checkpoints/ntu60_xsub/checkpoint_0450.pth.tar' for sanity check Traceback (most recent call last): File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 470, in
main()
File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 128, in main
main_worker(0, ngpus_per_node, args)
File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 260, in main_worker
sanity_check_encoder(model.state_dict(), args.pretrained)
File "/home/inspur/ZLQ/UmURL-main/action_recognition.py", line 396, in sanity_check_encoder
assert ((state_dict[k].cpu() == state_dict_pre[k_pre]).all()), \
KeyError: 'module.backbone.j_emb.t_embedding.0.weight'
Thankyou!