Open deebuls opened 1 year ago
Martins, André F. T. et al. “Sparse Continuous Distributions and Fenchel-Young Losses.” ArXiv abs/2108.01988 (2021): n. pag.
https://github.com/deep-spin/sparse_continuous_distributions
Blondel, Martins, and Niculae, 2019. Learning with Fenchel-Young Losses.
Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice.
On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
https://proceedings.neurips.cc/paper/2008/file/f5deaeeae1538fb6c45901d524ee2f98-Paper.pdf
The machine learning problem of classifier design is studied from the perspective of probability elicitation, in statistics. This shows that the standard approach of proceeding from the specification of a loss, to the minimization of conditional risk is overly restrictive. It is shown that a better alternative is to start from the specification of a functional form for the minimum conditional risk, and derive the loss function. This has various consequences of practical interest, such as showing that 1) the widely adopted practice of relying on convex loss functions is unnecessary, and 2) many new losses can be derived for classification problems. These points are illustrated by the derivation of a new loss which is not convex, but does not compromise the computational tractability of classifier design, and is robust to the contamination of data with outliers.
https://arxiv.org/abs/1805.07836
https://nips.cc/media/nips-2018/Slides/12761.pdf
I think we can improve on this with our genralized nll loss function
$$ - \frac{\log{\left(x \right)}}{\log{\left(base \right)}} $$
or
$$ - \frac{\log{\left(x \right)}}{\log{\left(base \right)}} - \frac{1}{base} $$