Svito-zar / gesticulator

The official implementation for ICMI 2020 Best Paper Award "Gesticulator: A framework for semantically-aware speech-driven gesture generation"
https://svito-zar.github.io/gesticulator/
GNU General Public License v3.0
122 stars 19 forks source link

Some questions about the result #65

Closed alexDJ-arch closed 1 year ago

alexDJ-arch commented 1 year ago

Hello! I'm wondering about the results shown in the paper, in the abstract, it mentioned "generates gestures as a sequence of joint angle rotations as output", but the visual result in your demo video is a human-like result, how to convert it? Can I use the digital human mesh with obj suffix as data to drive? Hope to get your early reply! Thank you!

ghenter commented 1 year ago

Hello there @dongjinmingDJM,

Hope to get your early reply!

Since @Svito-zar is away this week, I will respond instead.

in the abstract, it mentioned "generates gestures as a sequence of joint angle rotations as output", but the visual result in your demo video is a human-like result, how to convert it?

The process of obtaining video of a skinned character is explained under "Visualizing the results" in the readme.

My understanding is that the code, like the abstract says, generates joint angles for a so-called skeleton, in a plain-text file format called BVH. To visualise the results as a video you need a skinned 3D character that has been rigged to use our skeleton, such that poses of that particular skeleton drive poses for the 3D character. Then you need to use 3D software to render the motion as a sequence of still images (i.e., a video) showing the rigged character, and then add speech audio to the video. Conveniently, the steps from BVH to video have been baked into the visualiser from the GENEA Challenge 2020 (you should have seen that link in our readme), which uses the same skeleton as this repository to drive the 3D character you saw in our videos. The visualiser thus offers an easy way to turn the generated motion from this repository into a video.

Can I use the digital human mesh with obj suffix as data to drive?

This will not be straightforward, and will not work out of the box. I am not familiar with Wavefront obj files, but my current understanding is that they represent 3D meshes only, with no support for skeleton/rigging information. In other words, you will have to rig them with the relevant skeleton for your data in order to enable them to move.

Even if you have a pre-rigged character, it is likely to use a using a different skeleton, so you would need to retarget the motion from Gesticulator (converting our skeleton to yours) or re-rig those characters (so that they can be moved by our skeleton instead). In addition, Gesticulator only generates upper-body motion.

As a side note, if you want better-looking gesture motion, I would recommend trying some of our other gesture-generation methods, especially StyleGestures (which was trained on the same data as Gesticulator) or Listen, Denoise, Action (which represents our best-looking motion yet and whose code is scheduled to be released next week).

alexDJ-arch commented 1 year ago

Thank you very much for your reply! Indeed the obj file used as mesh don't have skeleton. I'll try to convert it to the data need for the framework. By the way, does the upper-body motion include facial animation according to the audio?

ghenter commented 1 year ago

does the upper-body motion include facial animation according to the audio?

No. The data used to train our system did not capture facial expression, so we could not learn to generate that.

We did generate facial expressions for another paper, called Let's Face It, but the data for replicating that work might not be easy to obtain.

alexDJ-arch commented 1 year ago

Thank you very much !

Svito-zar commented 1 year ago

Thank you @ghenter for clarifying this