Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

trained and sample result very strange (我自己训练复现的效果很奇怪) #58

Closed huangjch526 closed 3 months ago

huangjch526 commented 3 months ago

trained and sample result very strange (我自己训练复现的效果很奇怪,我在ucf数据集上面从头训练XL/2,训练到100000step,然后sample一些视频出现发现非常丑陋,根本没有规律)

https://github.com/Vchitect/Latte/assets/55589802/0313a52a-567b-41f3-9310-a3af30cd5420

huangjch526 commented 3 months ago

训练loss打印出来,降到大概0.03上下,这正常吗

maxin-cn commented 3 months ago

trained and sample result very strange (我自己训练复现的效果很奇怪,我在ucf数据集上面从头训练XL/2,训练到100000step,然后sample一些视频出现发现非常丑陋,根本没有规律)

sample.mp4

This is not a normal result. Did you notice a sudden increase in gradient during your training?How many Gpus did you train on?Can you provide a detailed training configuration?Thanks~

huangjch526 commented 3 months ago

Thank you so much, you're so nice. My training configuration is as follows:

My batchsize change to 1, ddp training on eight v100 32g. Because all the other parameters are completely unchanged and it's fine for me to sample the video using the checkpoint you provided. So I suspect it's because I changed the batchsize?

maxin-cn commented 3 months ago

Thank you so much, you're so nice. My training configuration is as follows:

My batchsize change to 1, ddp training on eight v100 32g. Because all the other parameters are completely unchanged and it's fine for me to sample the video using the checkpoint you provided. So I suspect it's because I changed the batchsize?

Check your training log for any sudden gradient increases. I suspect there may be something wrong with the training process.

huangjch526 commented 3 months ago

At 100step Gradient Norm: 1.1843 Gradual decrease, no mutation At 50000step Gradient Norm: 0.03 Is that normal?

maxin-cn commented 3 months ago

At 100step Gradient Norm: 1.1843 Gradual decrease, no mutation At 50000step Gradient Norm: 0.03 Is that normal?

It is normal. How long have you been training?

huangjch526 commented 3 months ago

50000step, about 17 hours

maxin-cn commented 3 months ago

50000step, about 17 hours

I think it has not converged, please train for a day or two

huangjch526 commented 3 months ago

How many steps did you train before the sampled video was normal?

maxin-cn commented 3 months ago

How many steps did you train before the sampled video was normal?

Because we use different training equipment, I can't give you an exact number. It takes about 2 days on the training equipment I use. You can also refer to Fig. 8 in our paper.

huangjch526 commented 3 months ago

May I ask what GPU's you were using, I'm training with 8 A100 80G now, roughly how many steps do I need to train? I see your paper converged at 150k.

maxin-cn commented 3 months ago

May I ask what GPU's you were using, I'm training with 8 A100 80G now, roughly how many steps do I need to train? I see your paper converged at 150k.

I have confirmed with someone who uses the same training equipment as you to repeat latte on ucf101 recently, and it will take about 10w iterations to get a normal video.

huangjch526 commented 3 months ago

非常感谢您,我找到原因了,其实是因为我的数据集文件夹格式和你dataset代码的读取默认格式不一样,所以我训成了无条件生成,但是推理又用了类别条件。(Thank you very much, I found the reason, actually it's because my dataset folder format is not the same as the read default format of your dataset code, so I trained it to unconditional generation, but then used the category condition for inference.)

huangjch526 commented 3 months ago

顺便一问,您的Taichi数据集是从哪里下载的,为啥我下载的全是mp4文件,可我看你dataset是按照图像frames来读取的?

maxin-cn commented 3 months ago

顺便一问,您的Taichi数据集是从哪里下载的,为啥我下载的全是mp4文件,可我看你dataset是按照图像frames来读取的?

I used the Taichi dataset after converting the videos into images.

huangjch526 commented 3 months ago

Could you provide your code to convert the videos into images?

huangjch526 commented 3 months ago

https://github.com/universome/stylegan-v/blob/master/src/scripts/convert_videos_to_frames.py

Are you using this code?

maxin-cn commented 3 months ago

https://github.com/universome/stylegan-v/blob/master/src/scripts/convert_videos_to_frames.py

Are you using this code?

You can use it.

MHRosenberg commented 1 month ago

Regarding convert_videos_to_frames.py, is there a significant performance/speed increase associated with that approach over extracting frames via opencv in python?

maxin-cn commented 1 month ago

Regarding convert_videos_to_frames.py, is there a significant performance/speed increase associated with that approach over extracting frames via opencv in python?

There should be no noticeable speed or performance gains.

ivylilili commented 1 month ago

trained and sample result very strange (我自己训练复现的效果很奇怪,我在ucf数据集上面从头训练XL/2,训练到100000step,然后sample一些视频出现发现非常丑陋,根本没有规律) sample.mp4

This is not a normal result. Did you notice a sudden increase in gradient during your training?How many Gpus did you train on?Can you provide a detailed training configuration?Thanks~

Hi maxin~ I noticed that you mentioned "the sudden increase in gradient". I've met the same problem. Did you know the reason why the gradient explosion happens? Would you be kind to tell how you solved this? Thanks very much!