Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
515 stars 57 forks source link

Code about calculating LVE #58

Closed WASD4959 closed 1 year ago

WASD4959 commented 1 year ago

Hi, when I reading codes about calculating metrics in cal_metric.py, I have some confusions. The defination of LVE in the paper is:

But the codes seems to calculate the maximal L2 error of all frames for each lip vertices and takes the average over all lip vertices.

    vertices_gt_all = np.array(vertices_gt_all)
    vertices_pred_all = np.array(vertices_pred_all)

    L2_dis_mouth_max = np.array([np.square(vertices_gt_all[:,v, :]-vertices_pred_all[:,v,:]) for v in mouth_map])
    L2_dis_mouth_max = np.transpose(L2_dis_mouth_max, (1,0,2))
    L2_dis_mouth_max = np.sum(L2_dis_mouth_max,axis=2)
    L2_dis_mouth_max = np.max(L2_dis_mouth_max,axis=1)

    print('Lip Vertex Error: {:.4e}'.format(np.mean(L2_dis_mouth_max)))

This really makes me feel confused. Did I misunderstand LVE ? Can you help me solve my confusion, Thank you!

Doubiiu commented 1 year ago

Hi I think you misunderstood the code. It should be consistent with the description in the paper. I add some comments here for your better understanding.

    L2_dis_mouth_max = np.array([np.square(vertices_gt_all[:,v, :]-vertices_pred_all[:,v,:]) for v in mouth_map]) #(mouth_vertices,frames,3) Firstly, this tensor should be a list of (frames, 3) and then converted into nparray (mouth_vertices,frames,3)
    L2_dis_mouth_max = np.transpose(L2_dis_mouth_max, (1,0,2)) #(frames, mouth_vertices,3)
    L2_dis_mouth_max = np.sum(L2_dis_mouth_max,axis=2) #(frames, mouth_vertices,)
    L2_dis_mouth_max = np.max(L2_dis_mouth_max,axis=1)#(frames, )

You can run the code to check if the shape is correct.

WASD4959 commented 1 year ago

Sorry I misunderstood the code.Thank you for your kindly reply! :D