Using `pose-format` for consistent `.pose` files

AI4Bharat / OpenHands

👐OpenHands : Making Sign Language Recognition Accessible. | **NOTE:** No longer actively maintained. If you are interested to own this and take it forward, please raise an issue

https://openhands.readthedocs.io

Apache License 2.0

97 stars 15 forks source link

Using `pose-format` for consistent `.pose` files #29

Open AmitMY opened 2 years ago

AmitMY commented 2 years ago

Seems like for pose data you are using pkl and h5. Also, that you have a custom mediapipe holistic script

Personally I believe it would be more shareable, and faster, to use a binary format like https://github.com/AmitMY/pose-format Every pose file also declares its content, so you can transfer them between projects, or convert them to different formats with relative is.

Besides the fact that it has a holistic loading script and multiple formats of OpenPose, it is a binary format which is faster to load, allows loading to numpy, torch and tensorflow, and can perform several operations on poses.

It also allows the visualization of pose files, separately or on top of videos, and while admittedly this repository is not perfect, in my opinion it is better than having json or pkl files.

GokulNC commented 2 years ago

Hi @AmitMY

Thanks for sharing this. Interesting, we did not know about this .pose format. Before starting our project, we looked if any there was any standard format for storing poses. Since there wasn't any widely adopted format for pose, we thought of standardizing using this pkl format that is easier & faster to load as well as very intuitive to use.

We had a quick glance through the pose-format repo, and this seems to be your binary's format: https://github.com/AmitMY/pose-format/blob/master/specs/v0.1.md

We had a few questions:

Do we have to create one .pose file per video? Or can it hold any number of videos?
Is there any example code which shows how to create this .pose file? (using random arrays of arbitrary shapes)
We usually just normalize the keypoints to [0,1] using (width, height) and do not store the video dimensions into pose file. Is there any usecase where the resolution will be useful?

bricksdont commented 2 years ago

Yes I think the intention is to always store 1 pose file per video.
The documentation needs some care, but there is some example code for exactly this in some unit tests. Example: https://github.com/AmitMY/pose-format/blob/master/pose_format/pose_test.py#L159. I think a useful addition to the library could be a function to create a random pose object.
In my opinion, normalization or similar processing should never change the underlying data, just offer a different view on the data. Developers of a library can never anticipate all the use cases.

AmitMY commented 2 years ago

Adding to Mathias's points:

While it is possible to store multiple videos, it's kind of hacky and not intended. The idea is that 1 pose file is the pose output from a video. For data loading, tfds or other formats can group them.
Here is a colab including loading poses from Mediapipe, reading/writing arbitrary poses, and fast way to store/read pose files with the same pose structure.
I agree with Mathias here - if you change the underlying data, it is not the best. pose-format also allows you to draw poses on the original videos, which would be harder to do if the data is changed. normalization can be a data view

GokulNC commented 2 years ago

Thanks Mathias & Amit. Makes sense.

Regarding this:

normalization or similar processing should never change the underlying data

if you change the underlying data, it is not the best.

I feel it's tricky to say what is "original" underlying data. As you might know, most models/tools (like Holistic) "directly" return the keypoints in range [0, 1], and any notion of absolute dimensions of keypoints is optionally brought-out by us (the users) by multiplying by width & height.

Example:

https://github.com/AmitMY/pose-format/blob/00f0efd167c1c0feefcb42ea06ec61f7e6e74fb5/pose_format/utils/holistic.py#L32

pose-format also allows you to draw poses on the original videos, which would be harder to do if the data is changed. normalization can be a data view

If the data is in [0,1], this is still achievable by just multiplying the keypoints by (W,H) of video.

But yeah, it is indeed good to have width and height fields in the header. To store (already) normalized data, one could just keep it as 1 (the default value).

GokulNC commented 2 years ago

BTW, thanks a lot for sharing that colab notebook. It was more clear looking at that. Would be great if you put it on your repo's README (atleast as a link to this notebook).

One question about the arbitrary pose creation. Mathias had said the following in the other thread:

we have 2 "real" headers: openpose and holistic.

But I do not see any such headers in PoseHeader nor the PoseBody types. Am I missing something?

bricksdont commented 2 years ago

I'm not sure if I understand your question correctly, are you asking a) where the Openpose and Holistic headers are defined in the library or b) why those header types are not mentioned in Amit's Colab?

GokulNC commented 2 years ago

Both. :) I asked since you had said it's mandatory.

bricksdont commented 2 years ago

ah, perhaps now I know what could have been confusing for you: it is not mandatory to pick either the openpose or the holistic header (in this sense, the header part is also generic).

You can also construct your own header from simpler parts, such as header components, individual keypoints and so on. That's what Amit is doing in his Colab.

The openpose header is defined here: https://github.com/AmitMY/pose-format/blob/master/pose_format/utils/openpose.py and the holistic header is here: https://github.com/AmitMY/pose-format/blob/master/pose_format/utils/holistic.py.

(edit: I myself would have expected to find these headers in a more central location, let's say in the file that defines the PoseHeader class. There certainly is room for improvement for this library.)

GokulNC commented 2 years ago

Ohh okay... I get it now, thanks! :)