Infinite Dimensional Hyperparameters

Some models allow infinite dimensional hyper-parameters. Eg 1: XGBoost allows custom loss functions. (As does @xiaodaigh's JLBoost.jl) Eg 2: @joshday's SparseRegression.jl allows custom loss & custom penalty

s = SModel(x, y, L2DistLoss(), L2Penalty())

Eg 3: @rakeshvar's AnyBoost.jl allows custom Loss/Activation/Constraint

An oft advertised feature of "doing ML in Julia" is how easy it is to customize traditional models: (Julia computing) & (Discourse)

Is it possible to make it easy to identify which models in MLJ allow infinite dim HP (custom Loss/Penalty/CEF) For example: models(x -> x.is_supervised && x.is_pure_julia && x.custom_penalty) lists all supervised models written in pure julia which allow custom penalties...
It would be awesome to automate tuning of infinite dimensional HP The Julia computing example adds weight parameter w to the logistic loss: It would be cool to make it easy tune w over a grid in MLJ...
At some point, it would be magical to include a model in MLJ called GenericLearner w/ 3 infinite dimensional inputs: (L(y,ŷ), f(x), P(f)) where: L(y, ŷ; θ_L): the Loss function, in the above example θ_L = w ŷ = f(x; θ_f): the Model, in SparseRegression.jl it is always f(x; θ_f)= θ_f*x P(f();θ_P): the Penalty, L1/MCP what suits your fancy as long as the opt problem is nice Then the user chooses her fav algorithm to solve: ŷ = arginf Q():= L(y, ŷ; θ_L) + P(f) ŷ = arginf Q():= L(y, f(x; θ_f); θ_L) + P(θ_f; θ_P) I realize this may be insanely hard to solve unless the user makes judicious choices of (L, f, P).
```
@load Generic
mdl  = Generic(L(), f(), P(), algorithm)
mach = machine(mdl, X, y)
fit!(mach, rows=train) 
```
A good student will see that this is "just" a constrained optimization problem w/ objective L(y,ŷ) and constraint P(f). Some joke that this is "All of ML in one expression". It's not but it is a powerful abstraction. UPDATE: EmpiricalRisks.jl by @lindahua does exactly this!

@rakeshvar put it nicely when discussing JuML.jl:

One of the main points of Gradient Boosting (and extensions like XGBoost) is its application/generalization to any loss (not just linear regression or logistic regression). This generalization is very easily achieved via Multiple Dispatch in julia. In that context, seeing that the package works only for logistic was surprising. Additionally, the code seems to be hard-coded around Logit loss. It would be great to abstract away the XGBoost algorithm from the direct dependence on loss, and instead, the algorithm can get the gradient (and hessian) from the loss Type (this is where multiple dispatch is so handy). This makes adding new losses very easy.

Also a generic learner is a great pedagogical tool b/c it compactly unifies many different ML models. I'd love to co-author a tutorial that gives a non-black box intro to ML.

First consider constant models: ŷ = f(x; θ_f) = θ_f (1.a) Mean: (L(y,ŷ) = (y-ŷ)^2, f(x)= θ_f, P(f)=0) (1.b) Median: (L(y,ŷ) = |y-ŷ|, f(x)= θ_f, P(f)=0) (1.c) Quantile: (L(y,ŷ; θ_L) = Check(), f(x)= θ_f, P(f)=0). Where θ_L is the quantile.

Next consider linear (in the params) models: ŷ = f(x; θ_f) = θ_f*x (2.a) OLS: (L(y,ŷ) = (y-ŷ)^2, f(x)= θ_f*x, P(f)=0) (2.b) Lasso: (L(y,ŷ) = (y-ŷ)^2, f(x)= θ_f*x, P(f)=L1()) (2.c) Ridge: (L(y,ŷ) = (y-ŷ)^2, f(x)= θ_f*x, P(f)=L2()) (2.d) LAD: (L(y,ŷ) = |y-ŷ|, f(x)= θ_f*x, P(f)=0) (2.e) Quantile Reg: (L(y,ŷ) = Check(), f(x)= θ_f*x, P(f)=0)

JuliaAI / MLJ.jl

Infinite Dimensional Hyperparameters #477