JuliaAI / MLJ.jl

A Julia machine learning framework
https://juliaai.github.io/MLJ.jl/
Other
1.8k stars 157 forks source link

TunedModel is not fitted with `measure=misclassification_rate` #725

Closed davidbp closed 3 years ago

davidbp commented 3 years ago

Describe the bug A tunned KNN where I want to select the K using as measure misclassification_rate is not fitted.

To Reproduce

using RDatasets, MLJ, NearestNeighbors, MLJModels, Pkg
iris = dataset("datasets", "iris")
y, X = unpack(iris, ==(:Species), colname -> true)

train_ind, test_ind = partition(Array(1:length(y)), 0.7, shuffle=true)

println(Pkg.status("MLJ"))

knn = @load KNNClassifier verbosity = 0

K_range = range(knn, :K, lower=5, upper=20);

self_tuning_knn = TunedModel(model=knn,
                             resampling = CV(nfolds=5),
                             tuning = Grid(resolution=5),
                             range = K_range,
                             measure=misclassification_rate);

m_self_tuning_knn = machine(self_tuning_knn, X, y)
fit!(m_self_tuning_knn, rows=train_ind, verbosity=0)

Expected behavior I would expect a fitted mathine where I can inspect the missclassication_rate in the CV results.

If there are restrictions in the type of functions that a TunedModel can use I could not find them in the documentation. I use a measure from performance_measures so I expected it to work.

Also a question arises:

Additional context The same code provided above removing measure, that is:

self_tuning_knn = TunedModel(model=knn,
                             resampling = CV(nfolds=5),
                             tuning = Grid(resolution=5),
                             range = K_range);

works as expected

Output of the provided script

julia tuned_model_with_missclassification_rate.jl 
Status `~/.julia/environments/v1.5/Project.toml`
  [add582a8] MLJ v0.15.0
nothing
┌ Error: Problem fitting the machine Machine{Resampler{CV,…}} @747, possibly because an upstream node in a learning network is providing data of incompatible scitype. 
└ @ MLJBase ~/.julia/packages/MLJBase/5TNcr/src/machines.jl:422
[ Info: Running type checks... 
[ Info: Type checks okay. 
┌ Error: Problem fitting the machine Machine{ProbabilisticTunedModel{Grid,…}} @898, possibly because an upstream node in a learning network is providing data of incompatible scitype. 
└ @ MLJBase ~/.julia/packages/MLJBase/5TNcr/src/machines.jl:422
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: LoadError: ArgumentError: 
KNNClassifier @449 <: Probabilistic but prediction_type(MisclassificationRate @021) = :deterministic. 
To override measure checks, set check_measure=false. 
Stacktrace:
 [1] _check_measure(::MisclassificationRate, ::MLJModels.NearestNeighbors_.KNNClassifier, ::CategoricalArrays.CategoricalArray{String,1,UInt8,String,CategoricalArrays.CategoricalValue{String,UInt8},Union{}}, ::Function) at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/resampling.jl:375
 [2] #239 at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/resampling.jl:392 [inlined]
 [3] _all(::MLJBase.var"#239#240"{MLJModels.NearestNeighbors_.KNNClassifier,CategoricalArrays.CategoricalArray{String,1,UInt8,String,CategoricalArrays.CategoricalValue{String,UInt8},Union{}},typeof(predict)}, ::Array{MisclassificationRate,1}, ::Colon) at ./reduce.jl:828
 [4] all(::Function, ::Array{MisclassificationRate,1}; dims::Function) at ./reducedim.jl:735
 [5] all at ./reducedim.jl:735 [inlined]
 [6] _check_measures at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/resampling.jl:391 [inlined]
 [7] _process_weights_measures(::Nothing, ::MisclassificationRate, ::Machine{MLJModels.NearestNeighbors_.KNNClassifier}, ::Function, ::Int64, ::Bool) at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/resampling.jl:432
 [8] fit(::Resampler{CV,MLJModels.NearestNeighbors_.KNNClassifier}, ::Int64, ::DataFrame, ::CategoricalArrays.CategoricalArray{String,1,UInt8,String,CategoricalArrays.CategoricalValue{String,UInt8},Union{}}) at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/resampling.jl:975
 [9] fit_only!(::Machine{Resampler{CV,MLJModels.NearestNeighbors_.KNNClassifier}}; rows::Nothing, verbosity::Int64, force::Bool) at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/machines.jl:420
 [10] #fit!#85 at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/machines.jl:478 [inlined]
 [11] event(::MLJModels.NearestNeighbors_.KNNClassifier, ::Machine{Resampler{CV,MLJModels.NearestNeighbors_.KNNClassifier}}, ::Int64, ::Grid, ::Nothing, ::NamedTuple{(:models, :fields, :parameter_scales),Tuple{Array{MLJModels.NearestNeighbors_.KNNClassifier,1},Array{Symbol,1},Array{Symbol,1}}}) at /Users/davidbuchaca1/.julia/packages/MLJTuning/6MZ7C/src/tuned_models.jl:301
 [12] #26 at /Users/davidbuchaca1/.julia/packages/MLJTuning/6MZ7C/src/tuned_models.jl:339 [inlined]
 [13] iterate at ./generator.jl:47 [inlined]
 [14] _collect(::Array{MLJModels.NearestNeighbors_.KNNClassifier,1}, ::Base.Generator{Array{MLJModels.NearestNeighbors_.KNNClassifier,1},MLJTuning.var"#26#27"{Machine{Resampler{CV,MLJModels.NearestNeighbors_.KNNClassifier}},Int64,Grid,Nothing,NamedTuple{(:models, :fields, :parameter_scales),Tuple{Array{MLJModels.NearestNeighbors_.KNNClassifier,1},Array{Symbol,1},Array{Symbol,1}}},ProgressMeter.Progress}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:699
 [15] collect_similar at ./array.jl:628 [inlined]
 [16] map at ./abstractarray.jl:2162 [inlined]
 [17] assemble_events(::Array{MLJModels.NearestNeighbors_.KNNClassifier,1}, ::Machine{Resampler{CV,MLJModels.NearestNeighbors_.KNNClassifier}}, ::Int64, ::Grid, ::Nothing, ::NamedTuple{(:models, :fields, :parameter_scales),Tuple{Array{MLJModels.NearestNeighbors_.KNNClassifier,1},Array{Symbol,1},Array{Symbol,1}}}, ::CPU1{Nothing}) at /Users/davidbuchaca1/.julia/packages/MLJTuning/6MZ7C/src/tuned_models.jl:338
 [18] build(::Nothing, ::Int64, ::Grid, ::MLJModels.NearestNeighbors_.KNNClassifier, ::NamedTuple{(:models, :fields, :parameter_scales),Tuple{Array{MLJModels.NearestNeighbors_.KNNClassifier,1},Array{Symbol,1},Array{Symbol,1}}}, ::Int64, ::CPU1{Nothing}, ::Machine{Resampler{CV,MLJModels.NearestNeighbors_.KNNClassifier}}) at /Users/davidbuchaca1/.julia/packages/MLJTuning/6MZ7C/src/tuned_models.jl:502
 [19] fit(::MLJTuning.ProbabilisticTunedModel{Grid,MLJModels.NearestNeighbors_.KNNClassifier}, ::Int64, ::DataFrame, ::CategoricalArrays.CategoricalArray{String,1,UInt8,String,CategoricalArrays.CategoricalValue{String,UInt8},Union{}}) at /Users/davidbuchaca1/.julia/packages/MLJTuning/6MZ7C/src/tuned_models.jl:571
 [20] fit_only!(::Machine{MLJTuning.ProbabilisticTunedModel{Grid,MLJModels.NearestNeighbors_.KNNClassifier}}; rows::Array{Int64,1}, verbosity::Int64, force::Bool) at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/machines.jl:420
 [21] #fit!#85 at /Users/davidbuchaca1/.julia/packages/MLJBase/5TNcr/src/machines.jl:478 [inlined]
 [22] top-level scope at /Users/davidbuchaca1/Documents/git_stuff/julia_tutorials/packages/MLJ/tuned_model_with_missclassification_rate.jl:21
 [23] include(::Function, ::Module, ::String) at ./Base.jl:380
 [24] include(::Module, ::String) at ./Base.jl:368
 [25] exec_options(::Base.JLOptions) at ./client.jl:296
 [26] _start() at ./client.jl:506
in expression starting at /Users/davidbuchaca1/Documents/git_stuff/julia_tutorials/packages/MLJ/tuned_model_with_missclassification_rate.jl:21
ablaom commented 3 years ago

Thanks for reporting.

According to the error message:

KNNClassifier @449 <: Probabilistic but prediction_type(MisclassificationRate @021) = :deterministic. 

Your model predicts probability distributions but misclassification_rate is for deterministic predictions:

julia> info(KNNClassifier).prediction_type
:probabilistic

julia> info(misclassification_rate).prediction_type
:deterministic

Your choices are: use a probabilistic measure, such as BrierScore() or LogLoss(); or to add the option operation=predict_mean to the TunedModel constructor (the default is predict, which is giving the probabilistic predictions).

You could also put your KNN model in a @pipeline with mean at the end; something like pipe = @pipeline KNNClassifier mean.

davidbp commented 3 years ago

Makes a lot of sense, thank you.

I think a classification example like this one would enchance a lot the documentation.

ablaom commented 3 years ago

Yeah, well, there's https://alan-turing-institute.github.io/DataScienceTutorials.jl/end-to-end/crabs-xgb/ and https://github.com/ablaom/MachineLearningInJulia2020/blob/master/tutorials.md#part-4---tuning-hyper-parameters but nothing on tuning a classifier in the main documentation.

I guess one could add an example to the "TuningModels" section https://github.com/alan-turing-institute/MLJ.jl/blob/dev/docs/src/tuning_models.md . PR welcome.

davidbp commented 3 years ago

Pull request created #726 . Hopefully if you accept the update there will be less issues like this one (or such as #126 ).