Thank you for the wonderful study and expressive codebase!
However, when I checked your repo and code, the author seemed to change the input size for facial features from $d\phi + 3 = 50 + 3$ [Evonne, 22] to $d\phi + d\alpha (pose) + d{detail} = 50 + 6 + 128$.
I was curious about the reason for this change in code compared to the paper because theoretically, the latent detail code should be static person-specific details which was independent of the expression behaviors of the listener. May I ask why did the author do that? It seems a bit redundant to add a temporally correlated feature to the analysis (I tested it on DECA and rendering the facial profile of a subject with different expressions was nearly identical with or without the detail code).
p0_list_faces_clean_deca.npy - face features (N x 64 x 184) for when p0 is listener
N sequences of length 64. Features of size 184, which includes the deca parameter set of expression (50D), pose (6D), and details (128D).
Thank you for the wonderful study and expressive codebase!
However, when I checked your repo and code, the author seemed to change the input size for facial features from $d\phi + 3 = 50 + 3$ [Evonne, 22] to $d\phi + d\alpha (pose) + d{detail} = 50 + 6 + 128$.
I was curious about the reason for this change in code compared to the paper because theoretically, the latent detail code should be
static person-specific details
which was independent of the expression behaviors of the listener. May I ask why did the author do that? It seems a bit redundant to add a temporally correlated feature to the analysis (I tested it on DECA and rendering the facial profile of a subject with different expressions was nearly identical with or without the detail code).