I used my own data set to report errors, and it ran smoothly on RTMPose.The shapes of the two matrices are incompatible and the matrix multiplication operation cannot be performed. #56
11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class PCKAccuracy.
11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class AUC.
11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class NME.
Loads checkpoint by local backend from path: work_dirs/rtmpose_x_dis_l__coco-ubody-256x192/rtm-x_ucoco.pth
The model and loaded state dict do not match exactly
size mismatch for head.mlp.1.weight: copying a param with shape torch.Size([256, 108]) from checkpoint, the shape in current model is torch.Size([256, 48]).
size mismatch for head.cls_x.weight: copying a param with shape torch.Size([576, 256]) from checkpoint, the shape in current model is torch.Size([384, 256]).
size mismatch for head.cls_y.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([512, 256]).
11/07 19:36:19 - mmengine - INFO - load backbone. in model from: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth
Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth
11/07 19:36:19 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
11/07 19:36:19 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
11/07 19:36:19 - mmengine - INFO - Checkpoints will be saved to /public/home/zhoubingjie2022/MMPose/DWPose/mmpose/work_dirs/rtmpose_x_dis_l__coco-ubody-256x192.
Traceback (most recent call last):
File "tools/train.py", line 161, in
main()
File "tools/train.py", line 157, in main
runner.train()
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1745, in train
model = self.train_loop.run() # type: ignore
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 129, in run_iter
data_batch, optim_wrapper=self.runner.optim_wrapper)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
losses = self._run_forward(data, mode='loss')
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
results = self(data, mode=mode)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(inputs[0], kwargs[0])
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/pose_estimators/distiller.py", line 83, in forward
return self.loss(inputs, data_samples)
File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/pose_estimators/distiller.py", line 113, in loss
lt_x, lt_y = self.teacher.head(fea_t)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/heads/coord_cls_heads/rtmcc_head.py", line 154, in forward
feats = self.mlp(feats) # -> B, K, hidden
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, kwargs)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4256x64 and 48x256)
/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 122892) of binary: /public/home/zhoubingjie2022/anaconda3/envs/mmpose/bin/python
Traceback (most recent call last):
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class PCKAccuracy. 11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class AUC. 11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class NME. Loads checkpoint by local backend from path: work_dirs/rtmpose_x_dis_l__coco-ubody-256x192/rtm-x_ucoco.pth The model and loaded state dict do not match exactly
size mismatch for head.mlp.1.weight: copying a param with shape torch.Size([256, 108]) from checkpoint, the shape in current model is torch.Size([256, 48]). size mismatch for head.cls_x.weight: copying a param with shape torch.Size([576, 256]) from checkpoint, the shape in current model is torch.Size([384, 256]). size mismatch for head.cls_y.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([512, 256]). 11/07 19:36:19 - mmengine - INFO - load backbone. in model from: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth 11/07 19:36:19 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io 11/07 19:36:19 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 11/07 19:36:19 - mmengine - INFO - Checkpoints will be saved to /public/home/zhoubingjie2022/MMPose/DWPose/mmpose/work_dirs/rtmpose_x_dis_l__coco-ubody-256x192. Traceback (most recent call last): File "tools/train.py", line 161, in
main()
File "tools/train.py", line 157, in main
runner.train()
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1745, in train
model = self.train_loop.run() # type: ignore
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 129, in run_iter
data_batch, optim_wrapper=self.runner.optim_wrapper)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step
losses = self._run_forward(data, mode='loss')
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward
results = self(data, mode=mode)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(inputs[0], kwargs[0])
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, kwargs)
File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/pose_estimators/distiller.py", line 83, in forward
return self.loss(inputs, data_samples)
File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/pose_estimators/distiller.py", line 113, in loss
lt_x, lt_y = self.teacher.head(fea_t)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/heads/coord_cls_heads/rtmcc_head.py", line 154, in forward
feats = self.mlp(feats) # -> B, K, hidden
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, kwargs)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4256x64 and 48x256)
/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects
--local_rank
argument to be set, please change it to read fromos.environ['LOCAL_RANK']
instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructionsFutureWarning, ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 122892) of binary: /public/home/zhoubingjie2022/anaconda3/envs/mmpose/bin/python Traceback (most recent call last): File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
tools/train.py FAILED