Nicholasli1995 / EvoSkeleton

Official project website for the CVPR 2020 paper (Oral Presentation) "Cascaded deep monocular 3D human pose estimation wth evolutionary training data"
https://arxiv.org/abs/2006.07778
MIT License
333 stars 44 forks source link

How to calculate stats for examples/inference.py? (e.g. mean_3d, std_3d dim_ignore_3d) #12

Closed ben-xD closed 3 years ago

ben-xD commented 3 years ago

In examples/inference.py, how do I calculate stats passed to unNormalizeData() from evo_skeleton.dataset.h36m.data_utils import unNormalizeData, as I am using my own images, and can't create a new stats.npy for my images. I would eventually like to pass video to it too.

Thanks in advance

Nicholasli1995 commented 3 years ago

As described in the docs, the stats can be downloaded at https://drive.google.com/file/d/158oCTK-9Y8Bl9qxidoHcXfqfeeA7qT93/view

In examples/inference.py, how do I calculate stats passed to unNormalizeData() from evo_skeleton.dataset.h36m.data_utils import unNormalizeData, as I am using my own images, and can't create a new stats.npy for my images. I would eventually like to pass video to it too.

Thanks in advance

ben-xD commented 3 years ago

Thanks Nicholas, however those stats files are just generated for the example images?, not for my images? I am trying to run it on my own images. Thank you

Nicholasli1995 commented 3 years ago

Thanks Nicholas, however those stats files are just generated for the example images?, not for my images? I am trying to run it on my own images. Thank you

[Short answer] You don't need to compute stats for your images in inference.py.

[Long answer] Let me give a more detailed explaination on the "stats".

  1. The inputs and outputs of the 2D-to-3D lifting model are normalized, which means one can not feed the raw pixel position (e.g., the hand key-point is at (550, 450)) to the model.

  2. Two ways of applying normalization are implemented in this repo. The first approach computes the statistics over the whole training dataset, while the second over a single example. This means the first way is dataset-dependent while the second is not.

  3. For in-the-wild images, inference.py uses the second way so that the user does not need to compute input key-points statistics over the whole dataset. The user just need to feed the right key-point positions and the normalization is done one subject by one subject.

  4. The downloaded "stats" represents the 3D statistics for output unnormalization. This does not affect the input but affects how the output will look like. Since it's computed over H36M, the output will look like the H36M style.

Finally, the repo is about single-frame inference. For extension to video input I suggest taking a look at https://github.com/facebookresearch/VideoPose3D.

ben-xD commented 3 years ago

Thanks for the explanation, that clears my confusion

Nicholasli1995 commented 3 years ago

Issues will be closed if there is no new discussion for more than one month. Re-open if needed.