Official demo generating convoluted representation of points?

osousa commented 1 year ago

I tried to follow the demo:

https://github.com/Walter0807/MotionBERT/blob/main/docs/inference.md

The following occurs:

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
Traceback (most recent call last):
  File "/content/MotionBERT/infer_wild.py", line 38, in <module>
    model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DSTformer:
    Missing key(s) in state_dict: "temp_embed", "pos_embed", "joints_embed.weight", "joints_embed.bias".....
    Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", "module.joints_embed.weight", "module.joints_embed.bias"....

I changed the code to remove the "module." from the keys:

print('Loading checkpoint', opts.evaluate)
checkpoint = torch.load(opts.evaluate, map_location=lambda storage, loc: storage)
model_state_dict = checkpoint['model_pos']

# Remove the "module." prefix from the keys if present
if list(model_state_dict.keys())[0].startswith('module.'):
    model_state_dict = {k[7:]: v for k, v in model_state_dict.items()}

It gives me a warning but proceeds to create the video file:

100% 2/2 [00:08<00:00,  4.05s/it]
  0% 0/253 [00:00<?, ?it/s]IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (930, 924) to (944, 928) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
  0% 1/253 [00:00<02:05,  2.01it/s][swscaler @ 0x6178540] Warning: data is not aligned! This can lead to a speed loss

The resulting file is not the expected one:

ezgif-5-aba37312bb

Thank you very much and congrats for all the work you've done

Walter0807 commented 1 year ago

Could you please show the input video with 2D pose estimations?

osousa commented 1 year ago

Hi there,

sure, the json with the points and the corresponding Alphapose video are inside this zip you provide in the example: sample :

Walter0807 commented 1 year ago

I will take a look in several days. Have you tried other videos?

osousa commented 1 year ago

It is indeed an issue with the demo files, used YoloV8 to create the keypoints dataset and it worked perfectly.

Thanks!

Walter0807 / MotionBERT

Official demo generating convoluted representation of points? #60