EvelynFan / FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
MIT License
778 stars 133 forks source link

Evaluation Results #14

Closed yangdaowu closed 2 years ago

yangdaowu commented 2 years ago

Hello, I want to know how to get the results and relevant evaluation indicators in the paper. Thank you for your answer

EvelynFan commented 2 years ago

Hello, I want to know how to get the results and relevant evaluation indicators in the paper. Thank you for your answer

Hi, the evaluation results are obtained by comparing the predictions (the vertices' coordinates) and the processed 3D face geometry data. For each frame, the maximal L2 error of all lip vertices is defined as the lip error. Basically, we first select a lip region (a set of vertices) from the face mesh. Then, we calculate the maximal L2 error for each frame and compute the average over all frames.

yangdaowu commented 2 years ago

Thank you for your answer. I also want to know how to calculate the two indicators of lip sync evaluation and Realization in this paper. I haven't found the relevant code. Can you provide it? Thank you again for your answer.

EvelynFan commented 2 years ago

Thank you for your answer. I also want to know how to calculate the two indicators of lip sync evaluation and Realization in this paper. I haven't found the relevant code. Can you provide it? Thank you again for your answer.

The evaluation results in terms of Lip Sync and Realism are based on the user study. We conduct user studies on AMT. The Turkers are instructed to judge the videos in terms of Lip Sync and Realism. Their answers are returned by the AMT platform and the respective percentage values can be easily obtained.

yangdaowu commented 2 years ago

Thank you for your answer. I get it.

yangdaowu commented 2 years ago

Hello, there is a problem, how should I get the maximum L2 error of each frame and calculate the average of all frames, I can only see the change of LOSS

image

EvelynFan commented 2 years ago

For the training, we use the MSE loss instead of the maximal L2 error. After the training is finished, we evaluate the trained model using the maximal L2 error of all lip vertices. But the maximal L2 error is just for lip-sync evaluation. For this task, we actually focus on the animation quality, and thus a qualitative evaluation would be more proper. Please refer to this issue #12

yangdaowu commented 2 years ago

Thank you for your answer. I get it.