JuliaAI / MLJTuning.jl

Hyperparameter optimization algorithms for use in the MLJ machine learning framework
MIT License
66 stars 12 forks source link

Machines wrapping `TunedModel` instances should never cache data #171

Closed ablaom closed 2 years ago

ablaom commented 2 years ago

Since TunedModel is just a wrapper, caching data might create an unnecessary copy.

Here tmodel::TunedModel wraps an EvoTreeClassifier object and X is a DataFrame.

mach = machine(tmodel, X, y) |> fit!
julia> mach.data # "outer" unecessary cached data
(3×2 DataFrame
 Row │ a        b          
     │ Float64  Float64    
─────┼─────────────────────
   1 │     1.0  0.136725
   2 │     2.0  0.00546956
   3 │     3.0  0.947711, CategoricalArrays.CategoricalValue{Char, UInt32}['a', 'a', 'a'])

julia> mach.cache[end].fitresult.machine.data # atomic model specific cached data
((matrix = [1.0 0.13672511011651545; 2.0 0.005469560151032837; 3.0 0.9477113320687569], names = [:a, :b]), CategoricalArrays.CategoricalValue{Char, UInt32}['a', 'a', 'a'])

The matrix is Tables.matrix(X) and so is a copy, not a view.

Remedy Declare MLJBase.caches_data_by_default(::Type{<:TunedModel}) = false.