Closed ParadaCarleton closed 11 months ago
Unrelated to MLJLinearModels. Please ask on discourse for help or possibly open an issue in MLJ directly. The data that gets passed through is not properly typed. You can see that here:
julia> tuned_machine = machine(config, x[:, Not(1)], x[:, 1]) |> fit!
ERROR: MethodError: no method matching fit(::GeneralizedLinearRegression{L2Loss, NoPenalty}, ::Matrix{Any}, ::Vector{Float64}; solver::Analytical)
It should be a Matrix{<:Real}
, this suggests that you might have missed an encoding step.
The data that gets passed through is not properly typed. You can see that here:
Right, sorry, I was under the impression that MLJ models were expected to accept arbitrary tables as inputs, rather than just accepting Matrix{<:Real}
. I'll edit this issue, then.
Issue name is incorrect, MLJ handles tables just fine and MLJLM handles matrices as it should too; the interface is handled by MLJ; the issue here is that you did not encode the categorical features.
julia> using DataFrames, CategoricalArrays, ScientificTypes, MLJModelInterface, MLJBase
julia> X = hcat(DataFrame(randn(10, 5), :auto), DataFrame(CategoricalArray.(eachcol(rand(["1", "2", "3", "4"], 10, 5))), :auto); makeunique=true);
julia> schema(X)
┌───────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├───────┼───────────────┼──────────────────────────────────┤
│ x1 │ Continuous │ Float64 │
│ x2 │ Continuous │ Float64 │
│ x3 │ Continuous │ Float64 │
│ x4 │ Continuous │ Float64 │
│ x5 │ Continuous │ Float64 │
│ x1_1 │ Multiclass{4} │ CategoricalValue{String, UInt32} │
│ x2_1 │ Multiclass{4} │ CategoricalValue{String, UInt32} │
│ x3_1 │ Multiclass{3} │ CategoricalValue{String, UInt32} │
│ x4_1 │ Multiclass{4} │ CategoricalValue{String, UInt32} │
│ x5_1 │ Multiclass{4} │ CategoricalValue{String, UInt32} │
└───────┴───────────────┴──────────────────────────────────┘
julia> typeof(MLJModelInterface.matrix(X))
Matrix{Any} (alias for Array{Any, 2})
The MLJModelInterface.matrix(X)
is how MLJ takes training data and passes it over to MLJLinearModels; as you can see the output is an un-typed matrix because it's got columns of strings with "1"
, "2"
etc.
TLDR: use an encoder then pass to the linear regressor.
For example:
Similar issue in EvoTrees.jl.