Closed tlienart closed 5 years ago
Could you please comment on the updates? I updated my environment and my projects were broken. Where I can find the breaking changes regarding scitypes? I see two types now target_scitype
and output_scitype
in MLJBase.jl
My Base package is broken here for example: https://github.com/juliohm/GeoStatsBase.jl/blob/master/src/learning/traits.jl#L13
Hello @juliohm , I believe this is the PR: https://github.com/alan-turing-institute/MLJBase.jl/pull/28 listing what happened; I don't think it explains the output_scitype
one but in your case I don't believe you need it (though @ablaom the output_type
should also be documented, I've added it to the list above).
In your case @juliohm this
MLJBase.target_scitype_union(model) == MLJBase.Continuous
MLJBase.target_scitype_union(model) == MLJBase.Finite
should simply become
MLJBase.target_scitype(model) == AbstractVector{MLJBase.Continuous}
MLJBase.target_scitype(model) == AbstractVector{MLJBase.UnivariateFinite}
You can have a look at the GLM interface (and more generally at interfaces in MLJModels which are up to date with these changes). Apologies for the inconvenience and if I can help with your interfacing with MLJ, let me know.
PS: more generally, the task
interface will likely be deprecated soon but I'll let anthony comment on this.
Thank you @tlienart , that is really helpful. I will commit the changes proposed.
What is the different between output_scitype and target_scitype?
I am glad that the task interface will change. Will it exist in a different form or it will be eliminated? I personally find the concept of tasks very important to design generic workflows. Currently at GeoStatsBase.jl I have a collection of task types that I plan to expand in the spatial domain.
The proposed changes didn't work. I had to do:
MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Finite} # for classification
MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Continuous} # for regression
After fixing it, now I get another error training the model. There was any change to the fit
interface? This is how I am using it: https://github.com/juliohm/GeoStatsBase.jl/blob/master/src/learning/models.jl#L21-L28
And what about the predict
interface, any change? This is how I am using it: https://github.com/juliohm/GeoStatsBase.jl/blob/master/src/learning/models.jl#L48
Thanks for bringing that up, we should clarify that...
Could you open another issue with the error(s) you get training the model? thanks! I don't believe there were big changes to either fit
or predict
.
@tlienart Good catch! Will fix.
@juliohm
Quoting above:
MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Finite} # for classification
MLJBase.target_scitype(model) == AbstractVector{<:MLJBase.Continuous} # for regression
Perfect.
No changes to the model fit and predict methods.
@tlienart
Regarding your suggestion,
MLJBase.target_scitype(model) == AbstractVector{MLJBase.UnivariateFinite}
This would not be appropriate. The target_scitype refers to the scitype of the training target not the target prediction. If probabilistic predictions are made, this is indicated using types not traits, by declaring struct MyProbabilisticClassifier <: Probabilistic
.
What is the rationale for AbstractVector
? Couldn't it be Vector
? That way we could do Vector{Continuous}
directly without the Vector{<:Continous}
, I think.
Could you please elaborate the difference between target_scitype
and output_scitype
?
Could you please elaborate the difference between
target_scitype
andoutput_scitype
?
Generally, the first is a trait for subtypes of Supervised <: Model
, the second for subtypes of Unsupervised <: Model
. Reasons for difference:
(1) users find target
, as applied to transformers confusing
(2) potentially a supervised model could have a transform
method (whose output is described by output_scitype
) in addition to a predict
method (whose "output", the target, is described by target_scitype
). For example, a neural network classifier might learn feature embeddings for its categorical inputs, which one may want to use in another classifier that only handles continuous inputs.
Issue resolved by above commit.
In https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/ there are still references to
target_scitype_union
which are deprecated unless I'm mistaken.@ablaom I can provide a short PR fixing this, can you think of anything else that may need fixing following the recent changes?
*_union
output_type