Weizhi-Zhong / IP_LAP

CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Apache License 2.0
637 stars 72 forks source link

train landmark generator #8

Open Gpwner opened 1 year ago

Gpwner commented 1 year ago

Hi,请问在训练landmark generator的时候,是需要手动停止吗?因为我看这个while循环的意思似乎没有break的逻辑:

https://github.com/Weizhi-Zhong/IP_LAP/blob/main/train_landmarks_generator.py#L303

如果是这样的话,请问给出的预训练模型是在训练到了多少个step才停下来的 、训练了多久呢 ?感谢。

Weizhi-Zhong commented 1 year ago

Hi~, thanks for your interest. The models are trained until the eval_L1_loss no longer decreases (about 6e-3). Under the default batchsize setting on a single RTX 3090, our model stopped at epoch 1837(610k iteration) with eval_L1_loss 5.866 e-3, using no more than one day. Training for the video renderer is similar. Train it until the FID no longer decrease (about 20).

yaleimeng commented 1 year ago

我训练的时候卡死了,长时间【经过了39小时都】没进度是怎么回事? Project_name: landmarkT5_d512_fe1024_lay4_head4 init dataset,filtering very short videos..... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52/52 [00:00<00:00, 425.18it/s] complete,with available vids: 51

init dataset,filtering very short videos..... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 433.37it/s] complete,with available vids: 8

0%| | 0/6 [00:00<?, ?it/s]
0%| | 0/6 [39:07:36<?, ?it/s]

Weizhi-Zhong commented 1 year ago

Hi, thanks for your interest. As can be seen, there are some continue statements in train_landmarks_generator.py. It aims to continue to find the next video if the landmarks or audio mel are missing for the current video. Thus, the reason may be that the --pre_audio_root or --landmarks_root argument is incorrect, resulting in a nonexistent landmark or audio path. You can debug it by printing some information before the continue statement.

Another reason may be that the batchsize is so large that the data can't be loaded into your machine. Try to decrease the batchsize according to your GPU memory.

The training for video renderer is similar. Hope this helps you~

yaleimeng commented 1 year ago

谢谢。确实是声音特征提取异常,mel()不需要参数却给了两个参数。被我忽略了。
调整依赖包的版本之后已经解决。

acolasialiuliu commented 6 months ago

谢谢。确实是声音特征提取异常,mel()不需要参数却给了两个参数。被我忽略了。 调整依赖包的版本之后已经解决。

请问,那个筛选短视频,直接给我筛选没了,是改哪儿哇

yaleimeng commented 6 months ago

应该是修改Dataset类的init()函数吧。 我看里面有相关的语句。@acolasialiuliu

acolasialiuliu commented 6 months ago

应该是修改Dataset类的init()函数吧。 我看里面有相关的语句。@acolasialiuliu

我修改了min_len 为0,然后音频和landmask全部读取了,但是训练都是0
图片

KelvinHuang66 commented 4 months ago

check your filelist/train.txt&test.txt

sunjian2015 commented 4 months ago

I use LRS2 dataset to train landmark model, but the loss does not decrease, above 0.006, who can tell me why? And, eval_velocity_loss will increase in the later stages of training.