Closed xiapengchng closed 1 year ago
Did you try to use the temporal constraint of video, like the consecuive T frames should have similar identity b and albedo alpha etc, since DECA and EMOCA only use single frames.
We only predict the expression coefficients in SPECTRE and get all other (including identity and albedo) from DECA. Thus, we do not use temporal constraints (since expressions are constantly changing in videos). However, we do use a window of 5 frames to get the prediction for the current frame.
Hi, I'm not really sure what you mean. The method uses as input only a video without audio.