EvelynFan / FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
MIT License
778 stars 133 forks source link

Token Alignment in Wav2Vec2.0 #26

Open jyangliu opened 2 years ago

jyangliu commented 2 years ago

Can pretrained Wav2Vec2.0 model facebook/wav2vec2-base-960h ensure alignment between input and output token? I find that facebook/wav2vec2-base-960h has been finetune with CTC.