Shared foundation for risk minimization

The efforts devoted to this line seem to slow down in the past several months.

(Regularized) empirical risk minimization standards squarely at the center of machine learning, and there's been extensive study in this area. As it becomes a central topic in my research agenda, I would like to revive the development efforts here.

Despite the numerous methods that have been proposed (primal vs dual, centralized vs distributed, deterministic vs stochastic, etc), most methods do share some basic building blocks:

Computation of loss functions and their gradients (over a single sample or a batch of samples)
Regularizers (squared L2, L1, etc)

Hence, I propose to establish the basic package (named RiskMinBase?) to provide these building blocks, such that algorithm developers can focus on the higher-level algorithmic metric while relying on this basic package to perform basic computations.

The type hierarchy should more or less follow what we are currently doing in RegERMs and SGDOptim.

abstract Loss

abstract UnivariateLoss <: Loss    # on a real-valued prediction
abstract MultivariateLoss  <: Loss  # on a vector-valued prediction (e.g. multinomial logistic regression)

# compute the loss value
function value(loss::UnivariateLoss, p::Real, y)
function value(loss::MultivariateLoss, p::AbstractVector, y)

# compute the derivative w.r.t. the prediction
function deriv(loss::UnivariateLoss, p::Real, y)
function deriv(loss::MultivariateLoss, p::AbstractVector, y)
function deriv!(loss::MultivariateLoss, g::AbstractVector, p::AbstractVector, y)

# compute the value and derivatives simultaneously
function value_and_deriv(loss::UnivariateLoss, p::Real, y)  # --> (v, deriv)
function value_and_deriv(loss::MultivariateLoss, p::AbstractVector, y)  # --> (v, deriv)
function value_and_deriv!(loss::MultivariateLoss, g::AbstractVector, p::AbstractVector, y) # --> (v, g)

This package provides value, deriv and value_and_deriv and inplace forms. The algorithm developers can choose the forms as they see fit.

Likewise, we can have a similar (but probably simpler) type hierarchy for regularizers.

I am seeking opinions here. Once we agree on something, I can take the lead to make that package.

cc @johnmyleswhite @BigCrunsh @dmbates @simonster

JuliaStats / Roadmap.jl

Shared foundation for risk minimization #15