fabro66 / GAST-Net-3DPoseEstimation

A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net)
MIT License
313 stars 70 forks source link

Unable to test on Human3.6M using pretrained model #11

Closed yerzhan7orazayev closed 3 years ago

yerzhan7orazayev commented 3 years ago

Hello there,

I am getting the following error when I try to test on Human3.6M using pretrained model. Namely, I am getting the following error. Could you please suggest a solution? Thanks for your work.

python3 trainval.py -k cpn_ft_h36m_dbb -arc 3,3,3 -c checkpoint --evaluate 27_frame_model.bin

Namespace(actions='*', architecture='3,3,3', batch_size=128, bone_length_term=True, by_subject=False, causal=False, channels=128, checkpoint='checkpoint', checkpoint_frequency=10, data_augmentation=True, dataset='h36m', disable_optimizations=False, downsample=5, dropout=0.05, epochs=60, evaluate='27_frame_model.bin', export_training_curves=False, keypoints='cpn_ft_h36m_dbb', learning_rate=0.001, lr_decay=0.95, no_eval=False, render=False, resume='', stride=1, subjects_test='S9,S11', subjects_train='S1,S5,S6,S7,S8', subset=1, test_time_augmentation=True, viz_action=None, viz_bitrate=3000, viz_camera=0, viz_downsample=1, viz_export=None, viz_limit=-1, viz_no_ground_truth=False, viz_output=None, viz_size=5, viz_skip=0, viz_subject=None, viz_video=None)
Loading dataset...
cpn_ft_h36m_dbb
Preparing data...
Loading 2D detections...
/home/orazayy/GAST-Net-3DPoseEstimation/model/local_attention.py:25: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729096996/work/torch/csrc/utils/python_arg_parser.cpp:882.)
  self.e = nn.Parameter(torch.zeros(out_features, len(self.m[0].nonzero()), dtype=torch.float))
INFO: Receptive field: 27 frames
INFO: Trainable parameter count:  6915984
Loading checkpoint checkpoint/27_frame_model.bin
Traceback (most recent call last):
  File "trainval.py", line 55, in <module>
    model_pos_train, model_pos, checkpoint = load_weight(args, model_pos_train, model_pos)
  File "/home/orazayy/GAST-Net-3DPoseEstimation/main.py", line 224, in load_weight
    print("This model was trained for {} epochs".format(checkpoint["epoch"]))
KeyError: 'epoch'
fabro66 commented 3 years ago

Hi~ You can fix it by commenting out this code because there is no epoch information in the pretrained model we provide.

yerzhan7orazayev commented 3 years ago

Thanks for your promth response.

I have some other questions regarding the implementation and model architecture.

  1. Are you using single camera view?
  2. I believe that the input to the model is 2D poses according to the paper. In this regard, for H3.6M dataset each frame is of shape 17x2, where 2 is for (x,y) coordinates of each 17 joints. Can you please direct me why these coordinates are in float and in which coordinate system they are?
  3. Also, how to obtain 3D poses when running the evaluation mode/or save it somewhere on storage during the evaluation?
  4. What are the practical implications of different receptive fields, i.e., 37 vs. 81? Does it only affect the performance, hence the inference time is larger as this number increases?
  5. Can one get higher inference fps if one uses more powerful GPU than your (NVIDIA GTX 1060 GPU) for real time 3D pose estimation from single RGB camera which is shooting the scene with 30fps?

Thanks.

fabro66 commented 3 years ago

Hi~ Thank you for your interest in our works. The answers below correspond to your question one by one

  1. Yes, our method uses a single camera view.
  2. The coordinates in the ".\data\keypoints\baseball.json" we provided are the coordinates of 2D keypoints in RGB images. The coordinates of keypoints are estimated by 2D pose estimation algorithms (HRNet), which outputs are float coordinates. If you want to draw it by OpenCV, you should convert it to the integer type.
  3. We have provided a tutorial on how to generate 3D poses/animation on custom videos. Please see "INFERENCE_EN.md" for more details.
  4. The size of the receptive field will affect the performance. The larger the receptive field, the better the performance, but it also increases inference time.
  5. In theory, a more powerful GPU will have a faster inference speed. Next, we will do a test on Titan RTX.