Closed xiapengchng closed 1 year ago
Sorry that I didn't train our model on the multiface dataset, I may not provide useful suggestions. But I think the model should support the motions in eyelids in principle. As the motions in eyelids can also be encoded into discrete motion representations and mapped from the audio signal somehow, although the relationship between the above two could be loose. I guess the reason that MeshTalk can easily drive eyelids is the tailored design in the loss function?
In the reconstruction stage, you use the mulitple facial component to quantize the code, while in the regression stage multiple facial component is predicted simultaneously for a frame. Compared to meshtalk, they autoregressive predict category for a single frame.
I try to train on the multiface dataset which has a lot of details in the upper face, but the driving results cannot close eyes when talking. While the meshtalk can close the eyes when talking.
The reconstruction stage works well when closing eyes?
Any suggestion?