Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

anyone meet zero grad?零梯度? #45

Closed huangjch526 closed 3 months ago

huangjch526 commented 4 months ago

when I train latte on UCF101, the grad of linear layer are all zero. I think it is strange, 零梯度?

maxin-cn commented 4 months ago

when I train latte on UCF101, the grad of linear layer are all zero. I think it is strange, 零梯度?

Hi, in Latte model initialization, we adopt the widely used 0 initialization, which may result in a relatively small gradient of the corresponding layer.