JuliaAI / MLJ.jl

A Julia machine learning framework
https://juliaai.github.io/MLJ.jl/
Other
1.78k stars 156 forks source link

models(matching(X, y)) for Images #593

Open azev77 opened 4 years ago

azev77 commented 4 years ago

Currently: models(matching(X, y)) doesn't return relevant models when X has images

import Flux;
X, y = Flux.Data.MNIST.images(), Flux.Data.MNIST.labels()
typeof(X), typeof(y)
models(matching(X, y))

I'm not sure if this is outside the scope of models(matching(X, y)). In principal it can also return all Time-series models etc models(matching(X, y), x -> x.TS == true)

azev77 commented 4 years ago

My favorite application of multiple Julia classifiers (XGBoost.jl, Flux, NaiveBayes.jl etc) on image data is: @oxinabox's https://white.ucc.asn.au/2017/12/18/7-Binary-Classifier-Libraries-in-Julia.html

ablaom commented 4 years ago

In MLJ you can't use integers to encode categorical data:

scitype(y)
AbstractArray{Count,1}

Fix:

y = coerce(y, Multiclass);

Now you can find the MLJFlux model:

julia> models(matching(X, y))
1-element Array{NamedTuple{(:name, :package_name, :is_supervised, :docstring, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :is_pure_julia, :is_wrapper, :load_path, :package_license, :package_url, :package_uuid, :prediction_type, :supports_online, :supports_weights, :input_scitype, :target_scitype, :output_scitype),T} where T<:Tuple,1}:
 (name = ImageClassifier, package_name = MLJFlux, ... )

You don't find the tree boosters because their current MLJ implementations expect tabular data.

julia> info("XGBoostClassifier").input_scitype
Table{_s23} where _s23<:(AbstractArray{_s25,1} where _s25<:Continuous)

while

julia> scitype(X)
AbstractArray{GrayImage{28,28},1}

You can still use them but you need to pre-process the data in the form of tables (currently). It might be useful to have transformer to to this kind of thing, but I have not looked into it.