anyone meet zero grad?零梯度？

Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.

Apache License 2.0

1.44k stars 147 forks source link

anyone meet zero grad?零梯度？ #45

Closed huangjch526 closed 3 months ago

huangjch526 commented 4 months ago

when I train latte on UCF101, the grad of linear layer are all zero. I think it is strange, 零梯度？

maxin-cn commented 4 months ago

when I train latte on UCF101, the grad of linear layer are all zero. I think it is strange, 零梯度？

Hi, in Latte model initialization, we adopt the widely used 0 initialization, which may result in a relatively small gradient of the corresponding layer.