Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
538 stars 58 forks source link

when train on multiface dataset, eyes can't close compared with MeshTalk #11

Closed xiapengchng closed 1 year ago

xiapengchng commented 1 year ago

I try to train on the multiface dataset which has a lot of details in the upper face, but the driving results cannot close eyes when talking. While the meshtalk can close the eyes when talking.

The reconstruction stage works well when closing eyes?

Any suggestion?

Doubiiu commented 1 year ago

Sorry that I didn't train our model on the multiface dataset, I may not provide useful suggestions. But I think the model should support the motions in eyelids in principle. As the motions in eyelids can also be encoded into discrete motion representations and mapped from the audio signal somehow, although the relationship between the above two could be loose. I guess the reason that MeshTalk can easily drive eyelids is the tailored design in the loss function?

xiapengchng commented 1 year ago

In the reconstruction stage, you use the mulitple facial component to quantize the code, while in the regression stage multiple facial component is predicted simultaneously for a frame. Compared to meshtalk, they autoregressive predict category for a single frame.