fabro66 / GAST-Net-3DPoseEstimation

A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net)
MIT License
312 stars 70 forks source link

GAST model with more than 17 keypoints? #28

Closed sebo361 closed 3 years ago

sebo361 commented 3 years ago

Hi, thank you for your amayzing work!

Is there a chance to get more than 17 keypoints from the 3D model? I am especially interested in receiving 3D foot keypoints.

Thanks!

fabro66 commented 3 years ago

Hi~ The pre-trained model we provide is trained on the Human3.6M data set, which does not include 3D foot keypoints. If you want to get 3D foot keypoints, you need to find a dataset including 3D foot keypoints to retrain GAST-Net.

sebo361 commented 3 years ago

Hi @fabro66 thanks for your quick answer. According to the H3.6 Team, the dataset includes following 32 keypoints:

{'Pelvis', 'RHip', 'RKnee', 'RAnkle', 'RToe', 'Site', 'LHip', 'LKnee', 'LAnkle', 'LeftToe', 'Site', 'Spine', 'Spine1', 'Neck', 'Head', 'Site', 'LShoulder', 'LShoulder', 'LElbow', 'LWrist', 'LThumb', 'Site', 'L_Wrist_End', 'Site', 'RShoulder', 'RShoulder', 'RElbow', 'RWrist', 'RThumb', 'Site', 'R_Wrist_End', 'Site'}

Here are all important keypoints plotted (without duplicates): image

As the foot toe keypoints are included, I would like to add them to the 3D pose prediction using GAST-Net too. Which configurations do I need to make to let the model output the foot toe keypoints? Probably i need to train the model from scratch right?

fabro66 commented 3 years ago

Hi~ You can use 2D GT with foot keypoints to train a new GAST-Net. You only need to change one line of code in the ./common/h36m_dataset.py file. : https://github.com/fabro66/GAST-Net-3DPoseEstimation/blob/ee05fa0ffe0a6945fca254d41fb800452be1ffd5/common/h36m_dataset.py#L281

python trainval.py -e 80 -k gt -arc 3,3,3 -drop 0.05 -b 128 --downsample 5
sebo361 commented 3 years ago

Thanks a lot for your help @fabro66!

When running inference, do i need to detect the 2D toe keypoints and feed it to the GAST-NET? Alphapose for example has a model with 26 keypoints including the toe keypoints which can be used. Providing the 2D toe keypoints would probably help the uplifting for achieving higher accuracy with predicting the toe keypoints, right?

fabro66 commented 3 years ago

Hi~ The number of input and output joints of GAST-Net is the same. If you train the GAST-Net with toe keypoints, you need to feed it to the model at inference.

sebo361 commented 3 years ago

Hi @fabro66 thanks for the clarification. I tried to train the a new model including the toe keypoints as you described (using data_2d_h36m_gt.npz and data_3d_36m.npz), but it throws me following error:

 File "trainval.py", line 53, in <module>
    model_pos_train, model_pos, pad, causal_shift = create_model(args, dataset, poses_valid_2d)
  File "/home/sebo/GAST-Net-3DPoseEstimation/main.py", line 170, in create_model
    channels=args.channels)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 215, in __init__
    layers_graph_conv.append(GraphAttentionBlock(adj, channels, channels, p_dropout=dropout))
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 16, in __init__
    self.local_graph_layer = LocalGraph(adj, input_dim, hid_dim, p_dropout)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/local_attention.py", line 84, in __init__
    raise KeyError("The dimension of adj matrix is wrong!")
KeyError: 'The dimension of adj matrix is wrong!'

So here I changed the num_joints_in=19 and num_joints_out=19. Also i changed this If-Check to num_joints == 19, but now I get following error:

File "trainval.py", line 119, in <module>
    epoch_loss_3d = train(model_pos_train, train_generator, optimizer)
  File "/home/sebo/GAST-Net-3DPoseEstimation/main.py", line 229, in train
    predicted_3d_pos = model_pos_train(inputs_2d)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 99, in forward
    x = self._forward_blocks(x)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 241, in _forward_blocks
    x = self.layers_graph_conv[0](x)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 27, in forward
    x_ = self.local_graph_layer(x)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/local_attention.py", line 128, in forward
    x = self.gcn_sym(input)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/local_attention.py", line 47, in forward
    output = torch.matmul(adj * E, h0) + torch.matmul(adj * (1 - E), h1)
RuntimeError: invalid argument 6: wrong matrix size at /opt/conda/conda-bld/pytorch_1556653183467/work/aten/src/THC/generic/THCTensorMathBlas.cu:494

Do you have any idea which adjustments are missing to train it with the 19 keypoints (17 COCO + 2 toe keypoints)?

fabro66 commented 3 years ago

Hi~ Sorry for not being able to reply to you in time. Suppose the order of the joints after adding toe key points is as follows::

0:pevils, 1:right hip, 2:right knee, 3:right ankle, 4:right toe, 5:left hip, 6:left knee, 7:left ankle, 8:left toe, 9:spine 10:thorax, 11:neck, 12:top head, 13:left shoulder, 14: left elbow, 15:left wrist, 16:right shoulder, 17:right elbow, 18: right wrist

You should add some codes to ./model/local_attention.py file: https://github.com/fabro66/GAST-Net-3DPoseEstimation/blob/ee05fa0ffe0a6945fca254d41fb800452be1ffd5/model/local_attention.py#L66

# Human3.6M
        if num_joints == 17:
            store_2 = [3, 6, 10, 13, 16]
            joints_left = [4, 5, 6, 11, 12, 13]
            joints_right = [1, 2, 3, 14, 15, 16]

        # Human3.6M detected from Stacked Hourglass
        elif num_joints == 16:
            store_2 = [3, 6, 9, 12, 15]
            joints_left = [4, 5, 6, 10, 11, 12]
            joints_right = [1, 2, 3, 13, 14, 15]

        # HumanEva
        elif num_joints == 15:
            store_2 = [4, 7, 10, 13]
            joints_left = [2, 3, 4, 8, 9, 10]
            joints_right = [5, 6, 7, 11, 12, 13]

       # Human3.6M including toe keypoints
        elif  num_joints == 19:
              store_2 = [3, 4, 7, 8, 12, 15, 18]
              joints_left = [5, 6, 7, 8, 13, 14, 15]
              joints_right = [1, 2, 3, 4, 16, 17, 18]

        else:
            raise KeyError("The dimension of adj matrix is wrong!")

Please let me know if this problem is solved

sebo361 commented 3 years ago

Hi @fabro66 thank you so much for helping again! I added the codes as you suggested:

       # Human3.6M including toe keypoints
        elif  num_joints == 19:
              store_2 = [3, 4, 7, 8, 12, 15, 18]
              joints_left = [5, 6, 7, 8, 13, 14, 15]
              joints_right = [1, 2, 3, 4, 16, 17, 18]

Unfortunately i still receive the same error as above:

File "trainval.py", line 119, in <module>
    epoch_loss_3d = train(model_pos_train, train_generator, optimizer)
  File "/home/sebo/GAST-Net-3DPoseEstimation/main.py", line 229, in train
    predicted_3d_pos = model_pos_train(inputs_2d)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 99, in forward
    x = self._forward_blocks(x)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 241, in _forward_blocks
    x = self.layers_graph_conv[0](x)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/gast_net.py", line 27, in forward
    x_ = self.local_graph_layer(x)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/local_attention.py", line 128, in forward
    x = self.gcn_sym(input)
  File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sebo/GAST-Net-3DPoseEstimation/model/local_attention.py", line 47, in forward
    output = torch.matmul(adj * E, h0) + torch.matmul(adj * (1 - E), h1)
RuntimeError: invalid argument 6: wrong matrix size at /opt/conda/conda-bld/pytorch_1556653183467/work/aten/src/THC/generic/THCTensorMathBlas.cu:494

However I checked the GT 2D keypoints of Human3.6M dataset again and i guess they provide 17 keypoints only. For example: len(keypoints['S5']['Walking'][0][0]) = 17... Might there still be a chance to get the GT of the 19 keypoints from H3.6m? Or do you have any other approach in mind to solve this?

fabro66 commented 3 years ago

You need to prepare 2D GT with 19 keypoints according to VideoPose3D data preprocessing tutorial. Then you change part of the code of the following two files respectively.

lulindeng commented 3 years ago

Hi,

I also try to use AlphaPose as the 2D detector, but the skeleton style in COCO and Human3.6m is different, SPINE, THORAX, PELVIS and HEAD, this four keypoints is not provided by the AlphaPose, currently I use the rest of 13 keypoints both in training and inference. Do you have any ideas in fine-tuning the model so that the output of the 2D detectors is 17 keypoints with same skeleton style in Human3.6m?

Deepest thanks for your reply!

BTW, @sebo361 mentioned that you want to add two more keypoints, perhaps you can try to use 17 keypoints in 2D gt and 19 keypoints in 3D gt to train a model since the human3.6 gt only contains 17 keypoints. I tried to use 13 keypoints in 2D gt to train a 17 keypoints and the performance is also acceptable for me. :)

fabro66 commented 3 years ago

Hi~ Thank you for your interest in our works!

You don't need to retrain a new model with COCO style. Because we have provided a file to convert the skeleton type from COCO to Human3.6M. The output of the HRNet 2D pose detector we used is also the type of COCO, which is also converted in this way. The final 3D estimation accuracy is acceptable. https://github.com/fabro66/GAST-Net-3DPoseEstimation/blob/ee05fa0ffe0a6945fca254d41fb800452be1ffd5/tools/mpii_coco_h36m.py#L16

lulindeng commented 3 years ago

Hi~ Thank you for your interest in our works!

You don't need to retrain a new model with COCO style. Because we have provided a file to convert the skeleton type from COCO to Human3.6M. The output of the HRNet 2D pose detector we used is also the type of COCO, which is also converted in this way. The final 3D estimation accuracy is acceptable. https://github.com/fabro66/GAST-Net-3DPoseEstimation/blob/ee05fa0ffe0a6945fca254d41fb800452be1ffd5/tools/mpii_coco_h36m.py#L16

Thanks for your prompt reply!

I have find this part of the codes, it is useful! Thanks again!

In the VideoPose3D, they fine-tune the 2D skeleton based on the COCO pretrained model so that the output of the 2D detector is the human3.6m style. Have you tried this method?

sebo361 commented 3 years ago

Hi @fabro66 thank you for your help! I trained the model with 19 keypoints as you described before and it worked out great! I receive nice training results similar to the ones described in the paper.

However I am trying to run the new model with 19 kpts on a video and running into issues again. I am using the Alphapose model with 26 keypoints and reordered the keypoints to the order as you described: 0:pevils, 1:right hip, 2:right knee, 3:right ankle, 4:right toe, 5:left hip, 6:left knee, 7:left ankle, 8:left toe, 9:spine 10:thorax, 11:neck, 12:top head, 13:left shoulder, 14: left elbow, 15:left wrist, 16:right shoulder, 17:right elbow, 18: right wrist

When loading the new GAST model with 19 kpts i receive following error: Loading GAST-Net ... 17 17 17 17 Traceback (most recent call last): File "gen_3D.py", line 415, in <module> model_pos = load_model_layer(rf) File "gen_3D.py", line 142, in load_model_layer model_pos.load_state_dict(checkpoint['model_pos']) File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SpatioTemporalModel: Missing key(s) in state_dict: "init_bn.weight", "init_bn.bias", "init_bn.running_mean", "init_bn.running_var", "expand_bn.weight", "expand_bn.bias", "expand_bn.running_mean", "expand_bn.running_var", "shrink.weight", "expand_conv.weight", "layers_conv.0.weight", .............. (it continues like this)

Is there something wrong with loading the model? Do i need to refactor the SpatioTemporalModel?

fabro66 commented 3 years ago

Hi~ Thank you for your interest in our works! You don't need to retrain a new model with COCO style. Because we have provided a file to convert the skeleton type from COCO to Human3.6M. The output of the HRNet 2D pose detector we used is also the type of COCO, which is also converted in this way. The final 3D estimation accuracy is acceptable. https://github.com/fabro66/GAST-Net-3DPoseEstimation/blob/ee05fa0ffe0a6945fca254d41fb800452be1ffd5/tools/mpii_coco_h36m.py#L16

Thanks for your prompt reply!

I have find this part of the codes, it is useful! Thanks again!

In the VideoPose3D, they fine-tune the 2D skeleton based on the COCO pretrained model so that the output of the 2D detector is the human3.6m style. Have you tried this method?

Hi~ Sorry for no prompt reply. Thanks for your suggestion. We have not yet tried to fine-tune the 2D detector on Human 3.6M.

fabro66 commented 3 years ago

Hi @fabro66 thank you for your help! I trained the model with 19 keypoints as you described before and it worked out great! I receive nice training results similar to the ones described in the paper.

However I am trying to run the new model with 19 kpts on a video and running into issues again. I am using the Alphapose model with 26 keypoints and reordered the keypoints to the order as you described: 0:pevils, 1:right hip, 2:right knee, 3:right ankle, 4:right toe, 5:left hip, 6:left knee, 7:left ankle, 8:left toe, 9:spine 10:thorax, 11:neck, 12:top head, 13:left shoulder, 14: left elbow, 15:left wrist, 16:right shoulder, 17:right elbow, 18: right wrist

When loading the new GAST model with 19 kpts i receive following error: Loading GAST-Net ... 17 17 17 17 Traceback (most recent call last): File "gen_3D.py", line 415, in <module> model_pos = load_model_layer(rf) File "gen_3D.py", line 142, in load_model_layer model_pos.load_state_dict(checkpoint['model_pos']) File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SpatioTemporalModel: Missing key(s) in state_dict: "init_bn.weight", "init_bn.bias", "init_bn.running_mean", "init_bn.running_var", "expand_bn.weight", "expand_bn.bias", "expand_bn.running_mean", "expand_bn.running_var", "shrink.weight", "expand_conv.weight", "layers_conv.0.weight", .............. (it continues like this)

Is there something wrong with loading the model? Do i need to refactor the SpatioTemporalModel?

Are you using a multi-GPU to train GASTNet? If this is the case, you need to change the parameter name when loading the model.

  new_state_dict = OrderedDict()
  chk_filename = os.path.join(args.checkpoint, args.resume if args.resume else args.evaluate)
  print("Loading checkpoint", chk_filename)
  checkpoint = torch.load(chk_filename, map_location=lambda storage, loc: storage)
  print("This model was trained for {} epochs".format(checkpoint["epoch"]))

  for k, v in checkpoint["model_pos"].items():
      name = k[7:]  # remove "module"
      new_state_dict[name] = v
  model_pos_train.load_state_dict(new_state_dict)
  model_pos.load_state_dict(new_state_dict)
sebo361 commented 3 years ago

Thnak you a lot @fabro66, but know there ist still a mismatch between my trained model and the GastNet architecture:

model_pos.load_state_dict(new_state_dict) File "/home/sebo/miniconda3/envs/alphaFB/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SpatioTemporalModel: size mismatch for layers_graph_conv.0.local_graph_layer.gcn_con.e: copying a param with shape torch.Size([128, 62]) from checkpoint, the shape in current model is torch.Size([128, 72]). size mismatch for layers_graph_conv.1.local_graph_layer.gcn_con.e: copying a param with shape torch.Size([256, 62]) from checkpoint, the shape in current model is torch.Size([256, 72]). size mismatch for layers_graph_conv.2.local_graph_layer.gcn_con.e: copying a param with shape torch.Size([512, 62]) from checkpoint, the shape in current model is torch.Size([512, 72]).

Do you have an idea which feature of 62 VS 72 is missing here?

fabro66 commented 3 years ago

@sebo361 Hi~ We have achieved 19-joint 3D pose estimations (including left and right toes). The result is acceptable. We will release this part of codes. Stay tuned. body_foot_pose_estimation

GIF(in ./image/Baseball_body_foot.gif): body_foot_pose_estimation

sebo361 commented 3 years ago

Hi @fabro66 wow thats fantastic news! When will you release the code approximately?

fabro66 commented 3 years ago

@sebo361 We will release this part of codes next month.

sebo361 commented 3 years ago

@fabro66 Fantastic looking forward to it!

fabro66 commented 3 years ago

@sebo361 I use whole-body HRNet provided by mmpose to detect 2D keypoints (133 keypoints, including body, foot, hand, and facial keypoints). It is easy to install.

fabro66 commented 3 years ago

@sebo361 I have updated GAST-Net for inferring 19-joints 3D human pose. Please check it.

sebo361 commented 3 years ago

@fabro66 Awesome thank you so much, it's working great 🥳