EvelynFan / FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
MIT License
778 stars 133 forks source link

getting same hidden states value from Wav2Vec2 for my dataset #42

Open ujjawalcse opened 1 year ago

ujjawalcse commented 1 year ago

Hey @EvelynFan , I tried to train the model on my custom datasets, but Wav2Vec2 is producing same hidden states value for all audio frames, Here is the reference,

torch.Size([1, 88800])
hidden_states: tensor([[[-0.0847,  0.0599, -0.0042,  ...,  0.1818,  0.0301, -0.0014],
         [-0.0847,  0.0599, -0.0042,  ...,  0.1818,  0.0301, -0.0014],
         [-0.0847,  0.0599, -0.0042,  ...,  0.1818,  0.0301, -0.0014],
         ...,
         [-0.0847,  0.0599, -0.0042,  ...,  0.1818,  0.0301, -0.0014],
         [-0.0847,  0.0599, -0.0042,  ...,  0.1818,  0.0301, -0.0014],
         [-0.0847,  0.0599, -0.0042,  ...,  0.1818,  0.0301, -0.0014]]],
       device='cuda:0')

Can you suggest some way out? Thanks.

xiaodongyichuan commented 1 year ago

i have same question

Shirley-0708 commented 10 months ago

@xiaodongyichuan @ujjawalcse Did anyone fix this problem?