Great works! Do you try with temporal images rather than audio?

filby89 / spectre

Official Pytorch Implementation of SPECTRE: Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

Other

236 stars 21 forks source link

Great works! Do you try with temporal images rather than audio? #8

Closed xiapengchng closed 1 year ago

filby89 commented 1 year ago

Hi, I'm not really sure what you mean. The method uses as input only a video without audio.

xiapengchng commented 1 year ago

Did you try to use the temporal constraint of video, like the consecuive T frames should have similar identity b and albedo alpha etc, since DECA and EMOCA only use single frames.

filby89 commented 1 year ago

We only predict the expression coefficients in SPECTRE and get all other (including identity and albedo) from DECA. Thus, we do not use temporal constraints (since expressions are constantly changing in videos). However, we do use a window of 5 frames to get the prediction for the current frame.