dropreg / R-Drop

870 stars 107 forks source link

kl loss in ViT example supposed to be divided by 2? #29

Closed sieu-n closed 2 years ago

sieu-n commented 2 years ago

https://github.com/dropreg/R-Drop/blob/3d97565595747f3b3d9c4701cb2fb824a9139913/vit_src/models/modeling.py#L298

Isn't L298 supposed to be the following?

loss += self.alpha * (kl_loss + reverse_kl_loss) / 2
apeterswu commented 2 years ago

Hi @krenerd,

Since the hyper self.alpha can control the loss weight, therefore it impacts not much about this. But thanks for your reminder, and we will revise accordingly.