Closed vicdxxx closed 2 years ago
The originally detected 2D facial landmarks are of course contain head pose. We do 3D facial tracking on each frame (via landmarks supervision) and get disentangled head poses sequences and facial movements-only 3D landmarks. The two parts are used as two different learning targets for two different models (audio2mouth/audio2headopse).
I found you guys disentangled landmark and headpose, but general landmark detector detect actually landmarks and headpose, you guys get neutral landmark from this final actually landmarks and headpose by inverse final actually landmarks using headpose?