JuliaAI / MLJLinearModels.jl

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)
MIT License
80 stars 13 forks source link

Expand docs to include cut and past examples for different Regressors #150

Open alex-s-gardner opened 10 months ago

alex-s-gardner commented 10 months ago

I think we could reduce learning curves by including some cut and past examples for the various regressors... it would also be good to include some discussion of when one Regressor model might be more appropriate than another.

tlienart commented 10 months ago

I don't disagree with this but would add that initially MLJLM was mainly meant to be used through MLJ by average users and there are examples in the MLJ tutorials of the various common regressors.

For the latter part (when x is more appropriate) it's pretty tricky and debatable apart from pretty generic stuff. I don't think you'll find an opinionated view on whether some model with regularisation is better than some other without or vice versa, typically people should get a sense of what problem they're facing (eg big outliers) then shove several of the models they think might address that in HP tuning and pick the one they believe generalises better based on some metric.

The current philosophy has been to follow sklearn with minimal doc explaining the loss function and letting users figure out whether that matches what they need.

But as always, if someone here would like to edit the docs to make them better for users who would like a bit more, PRs are always welcome

alex-s-gardner commented 10 months ago

What about something like this?

using MLJ
using MLJLinearModels
using Plots
using Random

# create data 
t = 1:0.01:10;
n = length(t);
gaussian_noise = randn(n) * 3;
outliers = rand((zeros(round(Int64, n/20))..., 6, -8, 100, -200, 178, -236, 77, -129, -50, -100, -45, -33, -114, -1929, -2000), n);

# measurment y
y = 10 .+ 10 * sin.(t) .+ 5 * t .+ gaussian_noise .+ outliers;

# design matrix 
X = hcat(ones(length(t)), sin.(t), t);

scale_penalty = false
fit_intercept = false

begin
    scatter(t, y; 
        markerstrokecolor=:match, 
        markerstrokewidth=0, 
        label = "observations", 
        ylim = (-70, 70),
        legend = :outerbottom,
        color = :grey,
        size = (700, 900)
    )

    # Base LSQ model fit
    println("Base Julia Linear Least Squares")
    @time θ = X \ y;
    plot!(t, X * θ, label="Base Julia Linear Least Squares", linewidth=2)

    regressor = LinearRegression(fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = HuberRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = RidgeRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = LassoRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = ElasticNetRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = QuantileRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = LADRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = GeneralizedLinearRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)

    regressor = RobustRegression(scale_penalty_with_samples=scale_penalty, fit_intercept=fit_intercept);
    println(typeof(regressor))
    @time θ = fit(regressor, X, y)
    plot!(t, X * θ, label=typeof(regressor), linewidth = 2)
end
Base Julia Linear Least Squares
  0.000168 seconds (36 allocations: 97.016 KiB)
GeneralizedLinearRegression{L2Loss, NoPenalty}
  0.000119 seconds (41 allocations: 118.719 KiB)
GeneralizedLinearRegression{RobustLoss{HuberRho{0.5}}, ScaledPenalty{L2Penalty}}
  0.001772 seconds (525 allocations: 699.094 KiB)
GeneralizedLinearRegression{L2Loss, ScaledPenalty{L2Penalty}}
  0.000100 seconds (8 allocations: 21.984 KiB)
GeneralizedLinearRegression{L2Loss, ScaledPenalty{L1Penalty}}
  0.003497 seconds (2.40 k allocations: 2.931 MiB)
GeneralizedLinearRegression{L2Loss, CompositePenalty}
  0.008676 seconds (4.13 k allocations: 4.338 MiB)
GeneralizedLinearRegression{RobustLoss{QuantileRho{0.5}}, ScaledPenalty{L2Penalty}}
  0.000732 seconds (323 allocations: 240.594 KiB)
GeneralizedLinearRegression{RobustLoss{QuantileRho{0.5}}, ScaledPenalty{L2Penalty}}
  0.000718 seconds (323 allocations: 240.594 KiB)
GeneralizedLinearRegression{L2Loss, NoPenalty}
  0.000143 seconds (41 allocations: 118.719 KiB)
GeneralizedLinearRegression{RobustLoss{HuberRho{0.1}}, ScaledPenalty{L2Penalty}}
  0.001428 seconds (493 allocations: 660.344 KiB)

Screenshot 2023-09-01 at 10 48 43 AM

tlienart commented 10 months ago

I think that's very nice :) (some curves don't appear?) if you wanted to add a page of the sorts to the docs, that would be great.

Small notes to add would be: (1) 2D data is quite different from nD data so the intuition you build with 2D might sometimes not help know what works best for nD, better to try when in doubt (2) hyperparameter tuning is essential for most of these models (in fact a nice small addition would be a visual representation of what happens to a curve, say to the L1 regression, when the strength of the regulariser is increased).

But generally speaking though, if this helped you, then no doubt it'll help others and it should be in the docs :)