Make list of hands aware of previous frames

MediaPipe Solution (you are using)

Hand Landmark Detection

Programming language

Python

Are you willing to contribute it

Describe the feature and the current behaviour/state

Currently, when feeding a video to the Hand Landmark detector, all frames are analyzed independently to my knowledge. This causes that for example if during a video there is a certain Hand A and the list of landmarks results has length 1, when a second Hand B comes into frame the results will have length 2, but no guarantee that the first element of the list will be Hand A or B. This obviously happens repeatedly when multiple hands are present in the video. In a certain frame the list might be [A, B, C], the next frame might be [B, A, C] thus making it very hard to perform some kinds of operations.

Will this change the current API? How?

No response

Who will benefit with this feature?

No response

Please specify the use cases for this feature

I'm implementing a gesture classification, where the static image of the hand is not enough to recognize the gesture. I have a dataset of videos, each with a single hand, so from that I can extract the landmarks and develop a model. However, at real-time inference, if multiple hands appear on the video I have no way to apply my model to each hand in the frame (assuming I've stored the information of the previous frames). This is because the temporal sequence of each element of the results list is broken across the different hands.

Any Other info

No response

google-ai-edge / mediapipe