ai4r / Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Other
245 stars 35 forks source link

Full skeleton training and inference #24

Closed zaverichintan closed 3 years ago

zaverichintan commented 3 years ago

Hello, Is it possible to extend the dataset to include the full skeleton with hands? Do you have any experience with experiments dealing with hands and fingers prediction based on Audio and text?

youngwoo-yoon commented 3 years ago

Of course you can extend the dataset with hands. When I first build the dataset about two years ago, I tried to extract hand poses, but its accuracy was much lower than body poses. You would try SOTA hand pose estimation models, which are much improved.

zaverichintan commented 3 years ago

Does the gesture Generation model adapt well with the higher dimension skeleton?

youngwoo-yoon commented 3 years ago

I believe that wouldn't be a problem. Increasing the capacity (# of layers or units) of the last output part might be helpful.

zaverichintan commented 3 years ago

Will try it out. Is the code for creating lmdb dataset available? Also is there any preprocessing step for 3D joints from open pose to pose sequence in LMDB dataset?

youngwoo-yoon commented 3 years ago

https://github.com/youngwoo-yoon/youtube-gesture-dataset would be a good reference. It does not create the same LMDB dataset that I used but the data processing is mostly similar.

zaverichintan commented 3 years ago

Ah cool, I compared the pickle files from the YouTube-gesture-dataset and the LMDB files, but they do not match. Is there any other post-processing done to replicate the LMDB file creation ? Are there any scripts available for this conversion ?

youngwoo-yoon commented 3 years ago

I share the script creating LMDB datasets: https://gist.github.com/youngwoo-yoon/0d5ae4e375aba9df10e75805bdf60ddd. Please mind that it may not be compatible with the repository aforementioned and may not give the same LMDB files used in the present repository because there were small untracked changes in the script.

zaverichintan commented 3 years ago

Thanks a lot. I see that the 3D skeleton poses which are used for Gesture model, is absent. Could you please share json with 3D pose as in: https://gist.github.com/youngwoo-yoon/0d5ae4e375aba9df10e75805bdf60ddd#file-make_lmdb_dataset-py-L124

youngwoo-yoon commented 3 years ago

Onedrive link which is the results of VideoPose3D. But I recommend you try recent body and hand pose estimators to get more accurate poses.