JuliaAI / MLJ.jl

A Julia machine learning framework
https://juliaai.github.io/MLJ.jl/
Other
1.78k stars 157 forks source link

Doc fix following MLJBase upgrade #225

Closed tlienart closed 5 years ago

tlienart commented 5 years ago

In https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/ there are still references to target_scitype_union which are deprecated unless I'm mistaken.

@ablaom I can provide a short PR fixing this, can you think of anything else that may need fixing following the recent changes?

juliohm commented 5 years ago

Could you please comment on the updates? I updated my environment and my projects were broken. Where I can find the breaking changes regarding scitypes? I see two types now target_scitype and output_scitype in MLJBase.jl

My Base package is broken here for example: https://github.com/juliohm/GeoStatsBase.jl/blob/master/src/learning/traits.jl#L13

tlienart commented 5 years ago

Hello @juliohm , I believe this is the PR: https://github.com/alan-turing-institute/MLJBase.jl/pull/28 listing what happened; I don't think it explains the output_scitype one but in your case I don't believe you need it (though @ablaom the output_type should also be documented, I've added it to the list above).

In your case @juliohm this

MLJBase.target_scitype_union(model) == MLJBase.Continuous
MLJBase.target_scitype_union(model) == MLJBase.Finite

should simply become

MLJBase.target_scitype(model) == AbstractVector{MLJBase.Continuous}
MLJBase.target_scitype(model) == AbstractVector{MLJBase.UnivariateFinite}

You can have a look at the GLM interface (and more generally at interfaces in MLJModels which are up to date with these changes). Apologies for the inconvenience and if I can help with your interfacing with MLJ, let me know.

PS: more generally, the task interface will likely be deprecated soon but I'll let anthony comment on this.

juliohm commented 5 years ago

Thank you @tlienart , that is really helpful. I will commit the changes proposed.

What is the different between output_scitype and target_scitype?

I am glad that the task interface will change. Will it exist in a different form or it will be eliminated? I personally find the concept of tasks very important to design generic workflows. Currently at GeoStatsBase.jl I have a collection of task types that I plan to expand in the spatial domain.

juliohm commented 5 years ago

The proposed changes didn't work. I had to do:

MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Finite} # for classification
MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Continuous} # for regression

After fixing it, now I get another error training the model. There was any change to the fit interface? This is how I am using it: https://github.com/juliohm/GeoStatsBase.jl/blob/master/src/learning/models.jl#L21-L28

And what about the predict interface, any change? This is how I am using it: https://github.com/juliohm/GeoStatsBase.jl/blob/master/src/learning/models.jl#L48

tlienart commented 5 years ago

Thanks for bringing that up, we should clarify that...

Could you open another issue with the error(s) you get training the model? thanks! I don't believe there were big changes to either fit or predict.

ablaom commented 5 years ago

@tlienart Good catch! Will fix.

@juliohm

Quoting above:

MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Finite} # for classification
MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Continuous} # for regression

Perfect.

No changes to the model fit and predict methods.

ablaom commented 5 years ago

@tlienart

Regarding your suggestion,

MLJBase.target_scitype(model) == AbstractVector{MLJBase.UnivariateFinite}

This would not be appropriate. The target_scitype refers to the scitype of the training target not the target prediction. If probabilistic predictions are made, this is indicated using types not traits, by declaring struct MyProbabilisticClassifier <: Probabilistic.

juliohm commented 5 years ago

What is the rationale for AbstractVector? Couldn't it be Vector? That way we could do Vector{Continuous} directly without the Vector{<:Continous}, I think.

Could you please elaborate the difference between target_scitype and output_scitype?

ablaom commented 5 years ago

Could you please elaborate the difference between target_scitype and output_scitype?

Generally, the first is a trait for subtypes of Supervised <: Model, the second for subtypes of Unsupervised <: Model. Reasons for difference:

(1) users find target, as applied to transformers confusing

(2) potentially a supervised model could have a transform method (whose output is described by output_scitype) in addition to a predict method (whose "output", the target, is described by target_scitype). For example, a neural network classifier might learn feature embeddings for its categorical inputs, which one may want to use in another classifier that only handles continuous inputs.

ablaom commented 5 years ago

Issue resolved by above commit.