Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
515 stars 57 forks source link

An error occurred while processing long audio using the provided pretrained model. #48

Open Owen1234560 opened 1 year ago

Owen1234560 commented 1 year ago

audio duration: 23s error: File "CodeTalker/main/demo.py", line 187, in test prediction = model.predict(audio_feature, template, one_hot) File "CodeTalker/models/stage2.py", line 133, in predict feat_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask) File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 5016, in multi_head_attention_forward raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.") RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

CengizhanYurdakul commented 1 year ago

I encountered the same error

Doubiiu commented 1 year ago

Hi, I think the reason for this error is the pre-defined max_seq_len in models.utils.py, and you may change it into a larger number. But I am not sure about the performance in this case (longer audio), thanks for sharing some experience here😃.

Owen1234560 commented 1 year ago

Thanks for your reply. I take a try.

aurelianocyp commented 8 months ago

@Owen1234560 have you solved this problem? my set is like this: def __init__(self, d_model, dropout=0.1, period=25, max_seq_len=60000) then I trained on this set in s1 and s2. however, it also can only process 10 seconds long audio, and will raise error in 20 seconds and longer

Doubiiu commented 8 months ago

@aurelianocyp You may also modify L27 in models.stage2.py self.biased_mask = init_biased_mask(n_head = 4, max_seq_len = 600, period=args.period), by setting max_seq_len accordingly.