Closed SleepEarlyLiveLong closed 1 year ago
You can using "model['trans'] = nn.DataParallel(model['trans'])"
Thank you a lot! It works when I run "python main.py". However, when I run refine, it failed and gives error as follows:
run: python main.py --refine --lr 1e-5 --reload --previous_dir checkpoint/1003_1041_53_351_no/
errors:
INFO: Training on 3119616 frames
INFO: Testing on 543360 frames
checkpoint/1003_1041_53_351_no/no_refine_4_4668.pth
0%| | 0/24372 [00:05<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 198, in
I tried add code like this: model['trans'] = nn.DataParallel(model['trans']) model['refine'] = nn.DataParallel(model['refine'])
it still doesn't work. So, can you please tell me how to use multi-GPUS when addimg refine modules? Thank you a lot!
Maybe you can try torch==1.7.1 or you can modify the https://github.com/Vegetebird/StridedTransformer-Pose3D/blob/9d988ac54234c5acc6a67ae746ce5bdbea204f8a/model/block/refine.py#L18 to nn.ReLU(inplace=True)
thank you! It is useful to use torch==1.7.1 to avoid that problem.
hello, thank you for your awesome work. I have toubles of using multi-gpus when training:
I added "model = nn.DataParallel(model)" before main.py line187: "all_param = []", but it doesn't work and gives an error: Traceback (most recent call last): File "main.py", line 190, in
for i_model in model:
TypeError: 'DataParallel' object is not iterable
can you please tell me how to solve this question? thank you!