EvelynFan / FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
MIT License
778 stars 133 forks source link

Does the wav have the time limition? #48

Closed Zeqing-Wang closed 1 year ago

Zeqing-Wang commented 1 year ago

Great work and really clear code. Thanks for your sharing again!

I tried some short wavs, the Faceformer woks well, but when i input a longer one, i met this error.

RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

EvelynFan commented 1 year ago

Great work and really clear code. Thanks for your sharing again!

I tried some short wavs, the Faceformer woks well, but when i input a longer one, i met this error.

RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

The default value of max_seq_len is 600. If you'd like to use longer audio, e.g., 1~3min,

in the "demo.py" file, please add:

from faceformer import PeriodicPositionalEncoding, init_biased_mask model.PPE = PeriodicPositionalEncoding(args.feature_dim, period = args.period, max_seq_len=6000) model.biased_mask = init_biased_mask(n_head = 4, max_seq_len = 6000, period=args.period)

after the line27: model.load_state_dict(.....).

Zeqing-Wang commented 1 year ago

Great work and really clear code. Thanks for your sharing again! I tried some short wavs, the Faceformer woks well, but when i input a longer one, i met this error. RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

The default value of max_seq_len is 600. If you'd like to use longer audio, e.g., 1~3min,

in the "demo.py" file, please add:

from faceformer import PeriodicPositionalEncoding, init_biased_mask model.PPE = PeriodicPositionalEncoding(args.feature_dim, period = args.period, max_seq_len=6000) model.biased_mask = init_biased_mask(n_head = 4, max_seq_len = 6000, period=args.period)

after the line27: model.load_state_dict(.....).

It works well ! Thank you again for sharing !

youngstu commented 1 year ago

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

Zeqing-Wang commented 1 year ago

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

I think you can make a sliding window to solve this problem.

youngstu commented 1 year ago

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

I think you can make a sliding window to solve this problem.

There will be jitter during splicing?

Zeqing-Wang commented 1 year ago

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

I think you can make a sliding window to solve this problem.

There will be jitter during splicing?

Yes, but you can use a part of the last window and take an average, it will help.

youngstu commented 1 year ago

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

I think you can make a sliding window to solve this problem.

There will be jitter during splicing?

Yes, but you can use a part of the last window and take an average, it will help.

vertice_emb = torch.cat((vertice_emb, new_output), 1)

Should last window 'vertice_emb ' transfer to next window ?

Zeqing-Wang commented 1 year ago

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

I think you can make a sliding window to solve this problem.

There will be jitter during splicing?

Yes, but you can use a part of the last window and take an average, it will help.

vertice_emb = torch.cat((vertice_emb, new_output), 1)

Should last window 'vertice_emb ' transfer to next window ?

I believe it will make sense, but i have not tried it. I mean the window of predicted mesh not the embedding. But I think your method is better.

youngstu commented 1 year ago

I will have a try. Thank you for your help. @Zeqing-Wang

zhaiyuan0217 commented 1 year ago

i also have jitter in the rendered video, did you solve it?

zhaiyuan0217 commented 1 year ago

I will have a try. Thank you for your help. @Zeqing-Wang

How to deal with audio longer than 360 seconds or more? By increasing max_seq_len very largely, it will cause unbearable cuda memory occupation.

I think you can make a sliding window to solve this problem.

There will be jitter during splicing?

Yes, but you can use a part of the last window and take an average, it will help.

vertice_emb = torch.cat((vertice_emb, new_output), 1) Should last window 'vertice_emb ' transfer to next window ?

I believe it will make sense, but i have not tried it. I mean the window of predicted mesh not the embedding. But I think your method is better.

what did you mean by the vertice_emb transfer to next window