f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.
https://backpack.pt/
MIT License
558 stars 55 forks source link

Extending `BCEWithLogitsLoss` to non-binary labels #281

Open f-dangel opened 1 year ago

f-dangel commented 1 year ago

BackPACK's extensions that rely on the probabilistic interpretation of a loss function as a negative log likelihood (quantities based on the Fisher, i.e. BatchDiagGGNMC, DiagGGNMC, SqrtGGNMC, KFAC) are limited to binary labels for BCEWithLogitsLoss.

This issue serves as documentation for the required steps and problems to support continuous-valued labels.

Description: Currently, we assume binary labels $y_n \in {0; 1}$. In this case, BCEWithLogitsLoss corresponds to the negative log likelihood of a Bernoulli distribution $p(y \mid fn)$ with $f{n} \in (0; 1)$ the sigmoid probability.

But BCEWithLogitsLoss also supports continuous labels $yn \in [0; 1]$. In this case, BCEWithLogitsLoss corresponds to negative log likelihood of a continuous Bernoulli distribution $p(y \mid f{n}) \propto f_{n}^{y} (1 - fn)^{1 - y}$, such that $- \log p(y=y{n} \mid f{n}) \propto -y{n} \log(f_n) - (1 - y_n) \log(1 - f_n)$.

Implementation: Depending on the nature of labels (binary or continuous), a different distribution must be used (Bernoulli or continuous Bernoulli) to compute sampled gradients. However, at the moment the _make_distribution function does not take into account the labels, but only receives the subsampled inputs. Hence, the interface must be adapted in order to support continuous labels in BCEWithLogitsLoss.

Problems: