evonneng / learning2listen

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)
106 stars 10 forks source link

Mismatch in face features size between paper and the code release #11

Closed nguyenntt97 closed 1 year ago

nguyenntt97 commented 1 year ago

Thank you for the wonderful study and expressive codebase!

However, when I checked your repo and code, the author seemed to change the input size for facial features from $d\phi + 3 = 50 + 3$ [Evonne, 22] to $d\phi + d\alpha (pose) + d{detail} = 50 + 6 + 128$.

I was curious about the reason for this change in code compared to the paper because theoretically, the latent detail code should be static person-specific details which was independent of the expression behaviors of the listener. May I ask why did the author do that? It seems a bit redundant to add a temporally correlated feature to the analysis (I tested it on DECA and rendering the facial profile of a subject with different expressions was nearly identical with or without the detail code).

p0_list_faces_clean_deca.npy - face features (N x 64 x 184) for when p0 is listener N sequences of length 64. Features of size 184, which includes the deca parameter set of expression (50D), pose (6D), and details (128D).