EvelynFan / FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
MIT License
778 stars 133 forks source link

Issue while training on Vocaset #40

Closed ujjawalcse closed 1 year ago

ujjawalcse commented 1 year ago

Hey @EvelynFan , Thanks for this awesome repo. I'm just trying to play with training on vocaset data. So just followed the steps for data preparation and run training with the following command,

python main.py --dataset vocaset --vertice_dim 15069 --feature_dim 64 --period 30 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --val_subjects "FaceTalk_170811_03275_TA FaceTalk_170908_03277_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA"

I'm getting the following error,

Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
model parameters:  92215197
Loading data...
100%|█████████████████████████████████████████| 475/475 [03:05<00:00,  2.55it/s]
314 40 39
  0%|                                                   | 0/314 [00:00<?, ?it/s]vertice shape: torch.Size([1, 117, 15069])
vertice_input shape: torch.Size([1, 1, 64])
vertice_input shape: torch.Size([1, 1, 64])
tgt_mask: tensor([[[0.]],

        [[0.]],

        [[0.]],

        [[0.]]], device='cuda:0')
memory_mask: tensor([[False,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True]], device='cuda:0')
  0%|                                                   | 0/314 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 151, in <module>
    main()
  File "main.py", line 146, in main
    model = trainer(args, dataset["train"], dataset["valid"],model, optimizer, criterion, epoch=args.max_epoch)
  File "main.py", line 34, in trainer
    loss = model(audio, template,  vertice, one_hot, criterion,teacher_forcing=False)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/my_work/object_recon/FaceFormer/faceformer.py", line 135, in forward
    vertice_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 233, in forward
    memory_key_padding_mask=memory_key_padding_mask)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 369, in forward
    key_padding_mask=memory_key_padding_mask)[0]
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 845, in forward
    attn_mask=attn_mask)
  File "/home/ujjawal/miniconda2/envs/caffe2/lib/python3.7/site-packages/torch/nn/functional.py", line 3873, in multi_head_attention_forward
    raise RuntimeError('The size of the 2D attn_mask is not correct.')

If anyone get this type of error while training, Please suggest how to resolve this issue.

brbernardo90 commented 1 year ago

Hey @ujjawalcse , take a look my colab, the trainning is running: https://colab.research.google.com/drive/1BjSd3RGkm8LSZDnxjOCVEB5f4g4PIURy?usp=sharing

I copied from yours and comment some things.

ujjawalcse commented 1 year ago

Thanks @brbernardo90 It need your access. Sent you an access request. Please check it once.

brbernardo90 commented 1 year ago

@ujjawalcse Ops, done! Thank you, your colab helped me a lot to start it.

ujjawalcse commented 1 year ago

yaa @brbernardo90 , It's running fine in Google Colab. But It's not running fine on my local PC. This is configuration of my pc, Ubuntu 18.04, torch 1.5.1+cu101 (Also tried the Torch 1.9 but got the same error) transformers 4.6.1 GPU : 8GB RTX 2070 Super RAM : 32 GB,

ujjawalcse commented 1 year ago

Got the training working properly now. Actually, I was using previous FaceFormerrepository cloned when it was newly introduced in local PC. So, I was missing one parameter batch_first=True at line no. 81 in decoder_layer in faceformer.py

Thanks.