11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class PCKAccuracy. 11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class AUC. 11/07 19:36:14 - mmengine - WARNING - The prefix is not set in metric class NME. Loads checkpoint by local backend from path: work_dirs/rtmpose_x_dis_l__coco-ubody-256x192/rtm-x_ucoco.pth The model and loaded state dict do not match exactly

size mismatch for head.mlp.1.weight: copying a param with shape torch.Size([256, 108]) from checkpoint, the shape in current model is torch.Size([256, 48]). size mismatch for head.cls_x.weight: copying a param with shape torch.Size([576, 256]) from checkpoint, the shape in current model is torch.Size([384, 256]). size mismatch for head.cls_y.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([512, 256]). 11/07 19:36:19 - mmengine - INFO - load backbone. in model from: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth Loads checkpoint by http backend from path: https://download.openmmlab.com/mmpose/v1/projects/rtmpose/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth 11/07 19:36:19 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io 11/07 19:36:19 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 11/07 19:36:19 - mmengine - INFO - Checkpoints will be saved to /public/home/zhoubingjie2022/MMPose/DWPose/mmpose/work_dirs/rtmpose_x_dis_l__coco-ubody-256x192. Traceback (most recent call last): File "tools/train.py", line 161, in main() File "tools/train.py", line 157, in main runner.train() File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/runner.py", line 1745, in train model = self.train_loop.run() # type: ignore File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 96, in run self.run_epoch() File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 112, in run_epoch self.run_iter(idx, data_batch) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/runner/loops.py", line 129, in run_iter data_batch, optim_wrapper=self.runner.optim_wrapper) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/model/wrappers/distributed.py", line 121, in train_step losses = self._run_forward(data, mode='loss') File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/mmengine/model/wrappers/distributed.py", line 161, in _run_forward results = self(data, mode=mode) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward output = self.module(inputs[0], kwargs[0]) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/pose_estimators/distiller.py", line 83, in forward return self.loss(inputs, data_samples) File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/pose_estimators/distiller.py", line 113, in loss lt_x, lt_y = self.teacher.head(fea_t) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/public/home/zhoubingjie2022/MMPose/DWPose/mmpose/mmpose/models/heads/coord_cls_heads/rtmcc_head.py", line 154, in forward feats = self.mlp(feats) # -> B, K, hidden File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (4256x64 and 48x256) /public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

FutureWarning, ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 122892) of binary: /public/home/zhoubingjie2022/anaconda3/envs/mmpose/bin/python Traceback (most recent call last): File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in main() File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/public/home/zhoubingjie2022/anaconda3/envs/mmpose/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/train.py FAILED

IDEA-Research / DWPose

I used my own data set to report errors, and it ran smoothly on RTMPose.The shapes of the two matrices are incompatible and the matrix multiplication operation cannot be performed. #56