should scale_penalty_with_samples = true be defualt?

alex-s-gardner commented 10 months ago

Just sinking my teeth into MLLinearModels and I see that scale_penalty_with_samples = true is the default. Playing around with a number of toy datasets it seems that scale_penalty_with_samples = true does not produce intuitive results while scale_penalty_with_samples = false does, e.g.:

# create data 
t = 1:0.001:10;
y = 10 .+ 10 * sin.(t) .+ 5 * t .+ randn(length(t)) * 2 .+ rand((zeros(round(Int64, length(t) / 5))..., 6, -8, 100, -200, 178, -236, 77, -129, -50, -100, -45, -33, -114, -1010, -1238, -2000), length(t))*.5;
X = hcat(ones(length(t)), sin.(t), t);
scatter(t, y; markerstrokecolor=:match, markerstrokewidth=0, label = "obs", ylim = (0, 70))

# Base LSQ model fit
θ = X \ y;
scatter!(t, X * θ, markerstrokewidth=0, label="Base lsq")

# index X[:,2:end] as fit includes offset by default
θ = fit(LinearRegression(), X[:, 2:end], y);
scatter!(t, hcat(X[:, 2:end], ones(length(t))) * θ, markerstrokewidth=0, label="linear")

θ = fit(HuberRegression(scale_penalty_with_samples=true), X[:, 2:end], y);
scatter!(t, hcat(X[:, 2:end], ones(length(t))) * θ, markerstrokewidth=0, label="huber")

θ = fit(HuberRegression(scale_penalty_with_samples=false), X[:, 2:end], y);
scatter!(t, hcat(X[:, 2:end], ones(length(t))) * θ, markerstrokewidth=0, label="huber-no_scale_penalty")

Given this, should scale_penalty_with_samples = false be made the default or is there a logical reason that it is not?

tlienart commented 10 months ago

I think this stuff tends to depend on what/how you use MLJLM for.

Looping in @jbrea in since he's the one who introduced the change and had a good rationale for it (I think there's an issue where this stuff is discussed I'll try to find it)

Yeah here's a previous discussion: https://github.com/JuliaAI/MLJLinearModels.jl/issues/124

alex-s-gardner commented 10 months ago

@tlienart thanks for the link to the previous discussion. To me it seems likely a smaller value for lambda probably should have been set as the default. New users, if moving quickly, will assume that several of the models will give very bad predictions given the defaults. I realize that each use case is different but It seems to me that a lambda would be used in the minority of use cases.

tlienart commented 10 months ago

Isn't that what the end of the discussion and the relevant commit did?

In any case if you look at scikit learn for instance, which is one of the reference implementation out there, they use scaling and they use an L2 with a nontrivial lambda by default for logreg for instance.

I personally don't have a strong opinion on this, to me it's a matter of having correct docs and potentially guiding the user in what they should do (eg HP tuning) ; if you feel that the docs were unclear please consider opening a PR for it.

alex-s-gardner commented 10 months ago

I think expanding the docs with examples of the different regression options would be a good idea. I'll open a new issue for this with the hopes of supplying a PR soon.

JuliaAI / MLJLinearModels.jl

should scale_penalty_with_samples = true be defualt? #149