SangbumChoi / MobileHumanPose

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).
MIT License
237 stars 29 forks source link

My result is not good #39

Open Berry-Wu opened 1 year ago

Berry-Wu commented 1 year ago

Hi, thanks for your great work. I trained the model using script 'python train.py --gpu 0-1 --backbone LPSKI' with Human3.6M and MPII datasets with 2 GTX 1080Ti. My config.py is like below:

    backbone = 'LPSKI'

    trainset_3d = ['Human36M']
    trainset_2d = ['MPII']
    testset = 'Human36M'

    input_shape = (256, 256) 
    output_shape = (input_shape[0]//8, input_shape[1]//8)
    width_multiplier = 1.0
    depth_dim = 32
    bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
    pixel_mean = (0.485, 0.456, 0.406)
    pixel_std = (0.229, 0.224, 0.225)

    ## training config
    embedding_size = 2048
    lr_dec_epoch = [17, 21]
    end_epoch = 25
    lr = 1e-3
    lr_dec_factor = 10
    batch_size = 64

    ## testing config
    test_batch_size = 32
    flip_test = True
    use_gt_info = True

    ## others
    num_thread = 12 #20
    gpu_ids = '0,1'
    num_gpus = 2
    continue_train = True

And I tested the model with test.py :

python main/test.py --gpu 0-1 --test_epoch 24-24 --backbone LPSKI

And the result is like below:

[0m Protocol 2 error (MPJPE) >> tot: 65.84
Directions: 58.14 Discussion: 66.00 Eating: 58.89 Greeting: 60.22 Phoning: 66.14 Posing: 58.50 Purchases: 57.18 Sitting: 79.16 SittingDown: 90.43 Smoking: 66.33 Photo: 74.44 Waiting: 63.51 Walking: 52.24 WalkDog: 67.90 WalkTogether: 59.40 

I wonder if it's a setup issue,During training,the loss between about epoch 13 to epoch 24 has a little change。The log is here Lastly, I want to know where can I find your trained model? :) Looking forward to your reply!

Berry-Wu commented 1 year ago

By the way, the params of model calculated by torchsummary is inconsistent with the paper:

embedding_size = 2048, width_mult=1 : Total params: 3.50M

In your paper, the params under the same configuration is 4.07M I want to know what is the difference? Is there have any other special settings? Looking forward to your reply!

Berry-Wu commented 1 year ago

After review the other issues, i find the difference between above config.py and yours is:

 output_shape: (input_shape[0]//8, input_shape[1]//8) --> (input_shape[0]//4, input_shape[1]//4)
 depth_dim = 32 --> 64

And in ski_cncat.py:

inverted_residual_setting = [
                # t, c, n, s
                [1, 64, 1, 1],  #[-1, 48, 256, 256] # from  [1, 64, 1, 2] ->  [1, 64, 1, 1]
                [6, 48, 2, 2],  #[-1, 48, 128, 128]
                [6, 48, 3, 2],  #[-1, 48, 64, 64]
                [6, 64, 4, 2],  #[-1, 64, 32, 32]
                [6, 96, 3, 2],  #[-1, 96, 16, 16]
                [6, 160, 3, 1], #[-1, 160, 8, 8]
                [6, 320, 1, 1], #[-1, 320, 8, 8]
            ]
~~~~~~~
out_channels= joint_num * cfg.depth_dim,  # from joint_num * 32 --> joint_num * cfg.depth_dim

Now , i find it match the original picture in the paper:

output shape is 64,64,1152
64 -->input_shape[0]//4
1152-->num_keypoints * depth_dim = 18 * 64

So I will check the result after training It is recommended that you modify the code of config.py on github :) Lastly, I still have a problem with loss which has been described above. Looking forward to your reply!

SangbumChoi commented 1 year ago

@Berry-Wu That is true at the time when I wrote this code, I overwrited all config files (dumb mistake). Let me know the result!

Berry-Wu commented 1 year ago

@SangbumChoi I have finished the training on 2 GTX 1080Ti about 30 hours. After changing the config, the GPU memory occupancy is so big, so i change the bacth_size to 32 I test the model on epoch 23 and 24 like this:

python main/test.py --gpu 0-1 --test_epoch 23-24 --backbone LPSKI

The log is here. And the result is below:

>>> Using GPU: 0,1
Load data of H36M Protocol 2
creating index...
index created!
Get bounding box and root from groundtruth
============================================================
LPSKI BackBone Generated
============================================================
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 136/136 [01:54<00:00,  1.18it/s]
Evaluation start...
Protocol 2 error (MPJPE) >> tot: 60.26
Directions: 55.26 Discussion: 60.47 Eating: 53.43 Greeting: 57.23 Phoning: 60.88 Posing: 52.77 Purchases: 55.38 Sitting: 73.62 SittingDown: 80.20 Smoking: 59.51 Photo: 66.54 Waiting: 56.99 Walking: 47.51 WalkDog: 62.86 WalkTogether: 54.66 
Test result is saved at /home/data3_4t/wzy/codes/MobileHumanPose/main/../output/result/bbox_root_pose_human36m_output.json
03-29 15:33:15 Protocol 2 error (MPJPE) >> tot: 60.26
Directions: 55.26 Discussion: 60.47 Eating: 53.43 Greeting: 57.23 Phoning: 60.88 Posing: 52.77 Purchases: 55.38 Sitting: 73.62 SittingDown: 80.20 Smoking: 59.51 Photo: 66.54 Waiting: 56.99 Walking: 47.51 WalkDog: 62.86 WalkTogether: 54.66 
============================================================
LPSKI BackBone Generated
============================================================
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 136/136 [00:53<00:00,  2.55it/s]
Evaluation start...
Protocol 2 error (MPJPE) >> tot: 60.51
Directions: 55.71 Discussion: 60.89 Eating: 53.79 Greeting: 57.65 Phoning: 61.38 Posing: 53.04 Purchases: 55.66 Sitting: 74.17 SittingDown: 79.73 Smoking: 59.67 Photo: 66.65 Waiting: 56.86 Walking: 47.60 WalkDog: 63.12 WalkTogether: 54.74 
Test result is saved at /home/data3_4t/wzy/codes/MobileHumanPose/main/../output/result/bbox_root_pose_human36m_output.json
03-29 15:34:11 Protocol 2 error (MPJPE) >> tot: 60.51
Directions: 55.71 Discussion: 60.89 Eating: 53.79 Greeting: 57.65 Phoning: 61.38 Posing: 53.04 Purchases: 55.66 Sitting: 74.17 SittingDown: 79.73 Smoking: 59.67 Photo: 66.65 Waiting: 56.86 Walking: 47.60 WalkDog: 63.12 WalkTogether: 54.74 

As you can see, the result on Protocol 2 is about 60.51mm. In your paper, the result of large model is 51.4mm. I don't know how to fill the gap :(

By the way, the param of calculated by torchsummary is 3.64M, in your paper is 4.07M. I don't know what‘s the gap. Could you help me? Looking forward your relpy! :)

SonNguyen2510 commented 1 year ago

@Berry-Wu Do you have any update on this? I really want to know If you can reproduce the result on the paper or not, I cannot match the setting as the paper said

Berry-Wu commented 1 year ago

@SonNguyen2510 Sorry, I didn't reproduce the result of the paper. My result is above, which has a gap with the paper. After several modifications,I think my config is consistent with the original paper. You can refer the config above. I hope it will help you! :) Besides, the author provides the pretrained model in there, you can test on it. I havn't do it. https://drive.google.com/drive/folders/146ZFPZyFyRQejB8CBYZ_R26NEXEO4EjI?usp=share_link

SonNguyen2510 commented 1 year ago

@Berry-Wu thank you for your reply, in order to test the pretrain model, I think I need to match it configuration. Do you know what is the config of that model? Is that the config above? thanks again

Berry-Wu commented 1 year ago

@SonNguyen2510 Sorry, I don't know. :( You can refer this issue: https://github.com/SangbumChoi/MobileHumanPose/issues/30 It seems that the author just uploaded random pth files.

SonNguyen2510 commented 1 year ago

@Berry-Wu it's ok, thank you anyway :)