Bug in KL scaling - Githubissues

AIH-SGML / mixmil

Code for the paper: Mixed Models with Multiple Instance Learning

Apache License 2.0

15 stars 0 forks source link

Hi, sorry for bothering again. However, I think I found a small bug within the KL divergence loss in the loss function.

Currently, the kl loss within the code is already weighted by the batch size divided by the total dataset (kld_w). So far so good.

However, dividing by y_shape[0] refers to the batch size, at least when I tried the Camelyon16 example, meaning that the KL loss would eventually be weighted by just the size of the dataset, which would make the KL loss very small.

Potential fix I guess y_shape[0] was meant to divide by the number of outputs P? In that case, I think it should be a simple fix and change it to: kld_term = kld_w * kld.sum() / y.shape[1]

or kld_term = kld_w * kld.sum() /self.P

to avoid any confusion about the dimensions

https://github.com/AIH-SGML/mixmil/blob/bae25eba1d2ece9d30df5d4c79e1676ba1989f19/mixmil/model.py#L103

AIH-SGML / mixmil

Bug in KL scaling #8