Open 15634960802 opened 1 week ago
simple_weight1 = (t + 1) / t.sqrt() simple_weight2 = (2 - t).sqrt() / (1 - t + self.eps).sqrt()
这个权重为什么这么设计呢? simple_weight1可能在时间步小的时候过大,会发生梯度爆炸。请问作者是怎么解决的呢?
We design weight following EDM (Elucidating the Design Space of Diffusion-Based Generative Models). In fact, the loss is minimal when t is close to zero, so we use a large weight. Please see details in EDM.
simple_weight1 = (t + 1) / t.sqrt() simple_weight2 = (2 - t).sqrt() / (1 - t + self.eps).sqrt()
这个权重为什么这么设计呢? simple_weight1可能在时间步小的时候过大,会发生梯度爆炸。请问作者是怎么解决的呢?