Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
537 stars 58 forks source link

[Question]why train in a teachingforcing scheme #13

Closed JSHZT closed 1 year ago

JSHZT commented 1 year ago

Hi,”the autoregressive model is trained in a teachingforcing scheme“ is mentioned in your article,but why?In previous related work, Faceformer pointed out that using this strategy will lead to poor results,can you please tell me your opinion?

Doubiiu commented 1 year ago

Hi We tried the teacher-forcing scheme (or not) during our early attempts but found the benefit of the non-teacher-forcing training scheme is limited. So we finally choose to use teacher-forcing in our experiment due to the high efficiency.

JSHZT commented 1 year ago

Hello, the results of my experiment have come out. The experiment shows that using the teachingforcing training strategy in Codetalker has a very large impact. I configured the training according to the original code with mesh data. After the training was completed, I tried single-frame reasoning. The result was very bad, but if I used The rule reasoning of teachingforcing, the reasoning result is very ideal, does this mean that the influence of each frame on subsequent frames is far greater than the influence of audio features?looking forward to your reply

Doubiiu commented 1 year ago

I think the model is not properly trained and not learning the mapping from audio to motion. In this case "the influence of each frame on subsequent frames is far greater than the influence of audio features" is correct. This is a problem for this kind of autoregressive kind of task and also occurs in my early attempts.