ShiqiYu / OpenGait

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.
664 stars 154 forks source link

when training SkeletonGait++, an AssertionError occurs #209

Closed yeyuming12138 closed 1 month ago

yeyuming12138 commented 1 month ago

Dear Professor, I am replicating Skeletongai ++ using the Gait3D data set,has the following problems: I have already creating symbolic links for heatmap and silhouette data, but when I train SkeletonGait++, an AssertionError occurs: [2024-05-05 21:09:07] [INFO]: -------- Train Pid List -------- [2024-05-05 21:09:07] [INFO]: [1234, 1512, ..., 1128] [2024-05-05 21:09:09] [INFO]: {'lr': 0.1, 'momentum': 0.9, 'solver': 'SGD', 'weight_decay': 0.0005} [2024-05-05 21:09:09] [INFO]: {'gamma': 0.1, 'milestones': [20000, 30000, 40000], 'scheduler': 'MultiStepLR'} [2024-05-05 21:09:09] [INFO]: Parameters Count: 25.56861M [2024-05-05 21:09:09] [INFO]: Model Initialization Finished! Traceback (most recent call last): File "opengait/main.py", line 73, in run_model(cfgs, training) File "opengait/main.py", line 56, in run_model Model.run_train(model) File "/root/autodl-tmp/Project/OpenGait/opengait/modeling/base_model.py", line 408, in run_train retval = model(ipts) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/root/autodl-tmp/Project/OpenGait/opengait/modeling/models/skeletongait++.py", line 90, in forward assert pose.size(-1) in [44, 48, 88, 96] AssertionError ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1227) of binary: /root/miniconda3/envs/AllinOne/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/AllinOne/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/envs/AllinOne/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in main() File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main launch(args) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch run(args) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/envs/AllinOne/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: Could you help me see what's wrong.Thank you very much and look forward to your early reply.

light201212 commented 1 month ago

assert pose.size(-1) in [44, 48, 88, 96].