Open Owen1234560 opened 1 year ago
I encountered the same error
Hi, I think the reason for this error is the pre-defined max_seq_len
in models.utils.py, and you may change it into a larger number. But I am not sure about the performance in this case (longer audio), thanks for sharing some experience here😃.
Thanks for your reply. I take a try.
@Owen1234560 have you solved this problem? my set is like this:
def __init__(self, d_model, dropout=0.1, period=25, max_seq_len=60000)
then I trained on this set in s1 and s2. however, it also can only process 10 seconds long audio, and will raise error in 20 seconds and longer
@aurelianocyp You may also modify L27 in models.stage2.py self.biased_mask = init_biased_mask(n_head = 4, max_seq_len = 600, period=args.period)
, by setting max_seq_len
accordingly.
audio duration: 23s error: File "CodeTalker/main/demo.py", line 187, in test prediction = model.predict(audio_feature, template, one_hot) File "CodeTalker/models/stage2.py", line 133, in predict feat_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask) File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 5016, in multi_head_attention_forward raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.") RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).