is_probabilistic=true in @pipeline syntax is clunky

JuliaAI / MLJ.jl

A Julia machine learning framework

https://juliaai.github.io/MLJ.jl/

Other

1.8k stars 157 forks source link

is_probabilistic=true in @pipeline syntax is clunky #305

Closed ablaom closed 4 years ago

ablaom commented 5 years ago

From a slack thread:

Would it be imaginable to remove the is_probabilistic=true and guess it instead given the last step? (i.e. either the last model is probabilistic or there’s an operation which is something specific) it feels pretty clunky to have to specify it. (edited)

ablaom commented 5 years ago

Originally it was this way. The tricky part it is not easy to reliably predict what the final output is. After a probablistic classifier, for example, there might be a function that just computes the mode (or some threshold-based point-prediction), or the a final function may just transform the predicted pdfs. So we really don’t know.

We could have the macro make a best guess of the prediction type (assume it is the same as the single supervised model in the pipeline) and leave the keyword for over-riding default behaviour.

Thoughts?

Incidentally, for uniformity with changes to traits in MLJBase, the keyword should really be "prediction_type = :probabilist" (even more to write!). Perhaps just ":probabilistic" is enough.

cc: @tlienart

tlienart commented 5 years ago

hmm yes I understand; on the other hand a pipeline is reasonably simple in that it's just a tube with operations in order, so we can inspect whatever is at the end right? to follow your line of thinking:

it's a probabilistic model
it's an operation
(something else?)

in the "it's an operation", exported functions like predict_mean, predict_mode or predict_thresh (yet to be defined) should be recognised and marked as deterministic.

If we allow arbitrary operations at the end (?) I think it would be fair to just make a guess based on the last step which we recognise and warn the user that they should specify is_probabilistic otherwise?

So I guess the line of thinking is similar except that you don't seem to include "recognising" operations if that's what's at the end of the pipe

ablaom commented 4 years ago

Partly addressed. Closing in favour of https://github.com/alan-turing-institute/MLJBase.jl/issues/267