EvelynFan / FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
MIT License
778 stars 133 forks source link

Evaluation result on BIWI dataset #45

Open yihu-dev opened 1 year ago

yihu-dev commented 1 year ago

Hi, I can not get the number as in the paper for BIWI Test-A which is 5.3742 x mm^-4.

The following is what I've tried:

  1. Rotate/scale/translate the raw data to align to the templates in your repo. The result scale factors for each subject: { 'F2': 179.7675, 'F3': 185.8210, 'F4': 185.8799, 'M3': 184.9965, 'M4': 186.2286, 'M5': 201.2294, 'F1': 173.5965, 'F5': 182.6764, 'F6': 186.2587, 'F7': 180.6849, 'F8': 180.7115, 'M1': 188.5588, 'M2': 192.8390, 'M6': 189.5069 }

  2. Manually select vertices over lip area using blender:

image

  1. Run the pretrained model on Test-A and save all sequence of vertices to file. Then calculate the max L2 vertex err using:
def get_lip_maxl2_err(v_hat, v, lip_inds, scale):
    """return max L2 err over lip area for each frame
        v: [N, v, 3] tensor
        v_hat: [M, v, 3] tensor
    """
    N, V, _ = v.shape
    N = min(v.shape[0], v_hat.shape[0])
    lip_err = (v[:N, lip_inds, :] - v_hat[:N, lip_inds, :]) *scale # scale to original size
    max_err, max_inds = (lip_err ** 2).mean(-1).max(-1)
    return max_err
  1. The rendered result looks quite good but the lip vertex err I got is 7.0980 on val set, and 8.1337 on test set. I would like to know if I'm doing it correctly or I've missed something.

https://user-images.githubusercontent.com/51768999/196108824-8bf58d23-34ea-4f9c-b018-26c24b5364d3.mp4

EvelynFan commented 1 year ago

Hi, I think there may be two reasons:

  1. The error is calculated by comparing the predictions and the processed 3D face geometry data. The different ways of preprocessing the data may result in different evaluation results.
  2. Different selected regions of lip vertices may also result in different results.
yihu-dev commented 1 year ago

Thanks for your reply.

For the first reason, I've compared preprocessed templates with yours, and the max vertex error is around 1^-9, so I guess it could be considered equal within float precision. I will try to find other possible mistakes in my evaluation code.

For the second reason, could you provide the indices of lip vertices?

songtoy commented 1 year ago

Hello~I'm interested in the alignment of template data. How does the scale factor come out?