JiangWenPL / multiperson

Code repository for the paper: "Coherent Reconstruction of Multiple Humans from a Single Image" in CVPR'20
https://jiangwenpl.github.io/multiperson/
376 stars 50 forks source link

Error running demo.py: a parameter group that doesn't match the size of optimizer's group #1

Closed nicolasugrinovic closed 4 years ago

nicolasugrinovic commented 4 years ago

Hi, kudos for the great work! Thank you for sharing the code.

When trying to run demo.py with the command

python3 tools/demo.py --config=configs/smpl/tune.py --image_folder=demo_images/ --output_folder=results/ --ckpt data/checkpoint.pt

I get the following error:

FIle "/miniconda3/envs/multiperson/lib/python3.7/site-packages/mmcv/runner/runner.py", line 313, in resume
    self.optimizer.load_state_dict(checkpoint['optimizer'])
File "/miniconda3/envs/multiperson/lib/python3.7/site-packages/torch/optim/optimizer.py", line 115, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

It seems to be triggered by something with the checkpoint and the model architecture. I also get the following message:


unexpected key in source state_dict: fc.weight, fc.bias
missing keys in source state_dict: layer3.0.bn2.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, bn1.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.5.bn2.num_batches_tracked
nkolot commented 4 years ago

Can you show me the full error traceback? Also did you install our version of mmcv? We have made some modifications from the original in mmdetection.

nicolasugrinovic commented 4 years ago

Thanks for the quick reply!

I installed mmcv just as mentioned on your readme. Here is the full traceback:

unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer4.0.bn2.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, bn1.num_batches_tracked, layer4.2.bn2.num_batches_tracked

2020-06-16 13:45:31,001 - INFO - load checkpoint from data/checkpoint.pt
2020-06-16 13:45:31,247 - WARNING - missing keys in source state_dict: smpl_head.smpl.v_template, smpl_head.loss.smpl.J_regressor_extra, smpl_head.loss.smpl.parents, smpl_head.smpl.posedirs, smpl_head.smpl.vertex_joint_selector.extra_joints_idxs, smpl_head.smpl.lbs_weights, smpl_head.smpl.J_regressor_extra, smpl_head.loss.smpl.vertex_joint_selector.extra_joints_idxs, smpl_head.loss.smpl.J_regressor, smpl_head.smpl.parents, smpl_head.smpl.J_regressor, smpl_head.loss.smpl.shapedirs, smpl_head.loss.smpl.posedirs, smpl_head.smpl.shapedirs, smpl_head.loss.smpl.faces_tensor, smpl_head.loss.smpl.v_template, smpl_head.loss.smpl.lbs_weights, smpl_head.smpl.faces_tensor

Traceback (most recent call last):
  File "tools/demo.py", line 190, in <module>
    main()
  File "tools/demo.py", line 149, in main
    runner.resume(cfg.resume_from)
  File "/home/nugrinovic/miniconda3/envs/multiperson/lib/python3.7/site-packages/mmcv/runner/runner.py", line 313, in resume
    self.optimizer.load_state_dict(checkpoint['optimizer'])
  File "/home/nugrinovic/miniconda3/envs/multiperson/lib/python3.7/site-packages/torch/optim/optimizer.py", line 115, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
nkolot commented 4 years ago

So as I can see from your traceback, Line 313 in mmcv/runner/runner.py is different from the one in the repo.

https://github.com/JiangWenPL/multiperson/blob/c79c0f82f5273dbbf7bf6612c10527323cdab07b/mmcv/mmcv/runner/runner.py#L313

The line where we load the optimizer state is actually Line 350 https://github.com/JiangWenPL/multiperson/blob/c79c0f82f5273dbbf7bf6612c10527323cdab07b/mmcv/mmcv/runner/runner.py#L350

I suspect that you might have a different version of mmcv installed at some point. We had encountered this issue in the past and explicitly put the optimizer state loading under a try/except block.

A quick solution would be to run rm -rf /home/nugrinovic/miniconda3/envs/multiperson/lib/python3.7/site-packages/mmcv* and then reinstall it. You might have to reinstall mmdetection if you do that.

nicolasugrinovic commented 4 years ago

That was exactly it! Somehow, I had another version installed. Now it is working, thanks!