Closed cklyne closed 4 months ago
ill just approve and run the file later. idk why we need padding there tbh if something doesnt match, shouldnt it just throw an error (?)
The padding is a generalization of the log-likelihood loss. It extends the loss to cases where not all values have a predicted uncertainty. It gives these values an effective uncertainty of 1, which turns their loss to a simple MSE (unless $\epsilon$ > 1).
For instance, if the user provides 3 labels but predicts only 4 values the assumption is that the first 3 are predictions corresponding to the labels and the other 1 is the uncertainty of the first prediction.
Because this is a feature which BDE probably won't require but that feels like a natural extension of the loss function, I decided to implement it as a part of the loss's splitting method: _split_pred()
, and BDE can enforce the requirement that all predictions should have an uncertainty.
Added log likelihood loss + tests.