Open fkiraly opened 5 years ago
I think this is a good point. There are two choices for exposing extra functionality at present:
(i) fit
may return additional information in its report
dictionary (this could include functions/closures but was not the original intention)
(ii) one implements methods beyond transform
dispatched on the fit-result. This presently requires adding ("registering") the method name to MLJBase.
@ablaom, I think the report dictionary returned by fit should, at most, be diagnostic reports of the fitting itself and not be abused for parameter inference or reporting.
I'd personally introduce a single method for all models, e.g., fitted_params which could return a dictionary of model parameters and diagnostics. These would be different for each model - for example, for ordinary least squares regression, it might return coefficients, CI, R-squared, and t/F test results.
What we may want to be careful about is the interaction with the parameter interface. I usually like to distinguish hyper-parameters = set externally, not changed by fit, and model parameters = no external access, set by fit.
Two issues here:
Type of information to be accessed after a fit
call. I suppose we can classify these into "parameter inference" and "other". It's not clear to me how "other" can be unambiguously divided further, but help me out here if you can.
Method of access. Dictionary or method. The original idea of dictionary was that it would be a persistent kind of thing, or even some kind of log/history. A dictionary has the added convenience that one adds keys according to circumstance (e.g., if I set a hyperparameter requesting fit
to rank features, then :feature_rankings
is a key of the report
dictionary, otherwise it is not.) Actually, report
isn't used currently to maintain a running log at the moment (by the correspondingmachine
) but it could be. A method has the advantage that extra computation required to produce the information wanted can be avoided until the user calls for it. Now that I think of it, method and dictionary could be combined - method computes a dictionary that it returns.
I like the simplicity of a returning a single object to report all information of possible interest, computed after every fit, whether it be fitted parameters or whatever. What is less clear to me is whether information that requires extra computation should be accessed:
(i) by requesting the computation through an "instruction" hyperparameter and returning the result in the same report
object; or
(ii) having a dedicated method dispatched on the fit-result, like predict.
Your thoughts?
What we may want to be careful about is the interaction with the parameter interface. I usually like to distinguish hyper-parameters = set externally, not changed by fit, and model parameters = no external access, set by fit. Agreed!
Some thoughts (after a longer time of thinking):
I think it would be a good idea to have a dedicated interface for fitted parameters, just as we have for hyperparameters, i.e., dictionary-style, and following exactly the same structure, nesting and accessor conventions for the fitting result as we have for the models.
What is automatically returned in this extension of fitresult are "standard model parameters that are easy to compute", i.e., it can be more than what predict needs but shouldn't add a lot of computational overhead. It also should be data-agnostic model structure parameters (e.g., model coefficients), or easy-to-obtain intermediate results for diagnostics (e.g., R-squared?).
Separate from this should be operations on the model that require significant computational overhead over fit/predict (e.g., variable importance), or that are data-dependent (e.g., F-test in-sample).
The standard stuff - i.e., standard methodology for diagnostics and parameter inference (e.g., for OLS, t-tests, CI, F-test, R-squared, diagnostic plots) I'd put in fixed dispatch methods diagnose (returns pretty-printable dict-like of summaries) or diagnose_visualize (produces plots/visualizations).
Advanced and non-standard diagnostics (e.g., specialized diagnostics or non-canonical visualizations) should be external, but these will be facilitated through the standardized model parameter interface once it exists.
Thoughts?
@fkiraly I have come around to accepting your suggestion for a dedicated method to retrieve fitted parameters, separate from the report field of a machine. I also agree that params
and fitted_params
(which will have "nested" values for composite models) should return the same kind of object. I think a Julia NamedTuple (like a dict but with ordered keys and type parameters for each value) is the way to go. This will also be the form of the (possibly nested) report
field, and report
will get an accessor function, so that params, fitted_params, report
are all methods that can be called on a (fitted) machine to return a named tuple.
I am working on implementing these various things simultaneously.
I think a Julia NamedTuple (like a dict but with ordered keys and type parameters for each value) is the way to go
A noteworthy difference being that a NamedTuple is immutable, could that cause a problem here?
@ablaom, I'm onboard with NamedTuple or dictionary returned by method. The method be able to return abstract structs in its fields, and should be able to change with each run of fit.
Regarding user interface: I'd make it a method (by dispatch), and call it "inspect" unless you have a better idea.
On a side note, I think this would also help greatly with the issue highlighted in the visualization issue #85 , the "report" being possibly arcane and non-standardized.
Further to this, I think computationally expensive diagnostics such as "interpretable machine learning" style meta-methods should not be bundled with "inspect", but rather with external "interpretability meta-methods" (to be dealt with at a much later point). The "inspect" interface point should be reserved for parameters or properties which do not add substantial computational overhead over "fit" - this could, for example, be defined as only constant (or log(# training data pts) ) added computational effort above "fit".
Hm, maybe another two default interface points - "print" and "plot" would be great? These are default interface points in R.
"print" gives back a written summary, for example
Call:
lm(formula = weight ~ group - 1)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4938 0.0685 0.2462 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
groupCtl 5.0320 0.2202 22.85 9.55e-15 ***
groupTrt 4.6610 0.2202 21.16 3.62e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798
F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16
"plot" produces a series of standard diagnostic plots, which may differ by model type and/or task. I would conjecture there's some that you always want for a task (e.g., cross-plot and residual plot for deterministic supervised regerssion; calibration curves for probabilistic classification), and some that you only want for a specific model class (e.g., learning curves for SGD based methods, heatmaps for tuning methods)
Interesting question: where would "cross-plots out-of-sample" sit? Probably only available in the evaluation/validation phase, i.e., with the benchmark orchestrator.
Actually, I notice you already made a suggestion for a name: fitted_params. Also fine with me - though I wonder, should this include easy-to-compute stuff such as F-statistic and in-sample-R-squared as well? Or should that be left to (a separate interface point!) "inspect"? Thoughts?
Also I realize, I've already said some of these things, albeit slightly differently, on Feb 4. So greetings, @fkiraly from the past, I reserve the right to not fully agree with you.
To clarify the existing design, we have these methods (dispatched on machines, params
also on models):
params
to retrieve possibly nested hyperparameters fitted_params
to retrieve possibly nested learned parameters report
to retrieve most everything else (could be nested), including computationally expensive stuff As laid out in the guide (see below): Whether or not a computationally expensive item is actually computed is controlled by an "instruction" hyperparameter of the model. If a default value is not overridden, the item is empty (but the key is still there), a clue to user that more is available. I prefer this to a separate method to avoid method name proliferation.
I think the above cover MLR's "print" method. But we could overload Base.show
for named tuples to make more user-friendly. Don't like name "print". Print what? Just about every command prints something. (edit but you could say the same about "report" - aarrgh!. Maybe "extras" ??)
Not so keen on changing name of "report" as this is breaking.
@tlienart I think every item of report
should be regenerated at every call to fit (or update) so that information there is synchronised with the hyperparamter values attached to the machine's current model. So immutability not an issue. So far, the params
method is just a convenience method for the user; tuning is carried out using other methods.
From the guide:
report
is a (possibly empty) NamedTuple
, for example,
report=(deviance=..., dof_residual=..., stderror=..., vcov=...)
.
Any training-related statistics, such as internal estimates of the
generalization error, and feature rankings, should be returned in
the report
tuple. How, or if, these are generated should be
controlled by hyperparameters (the fields of model
). Fitted
parameters, such as the coefficients of a linear model, do not go
in the report as they will be extractable from fitresult
(and
accessible to MLJ through the fitted_params
method, see below)....
A fitted_params
method may be optionally overloaded. It's purpose is
to provide MLJ accesss to a user-friendly representation of the
learned parameters of the model (as opposed to the
hyperparameters). They must be extractable from fitresult
.
MLJBase.fitted_params(model::SomeSupervisedModelType, fitresult) -> friendly_fitresult::NamedTuple
For a linear model, for example, one might declare something like
friendly_fitresult=(coefs=[...], bias=...)
.
The fallback is to return (fitresult=fitresult,)
.
Very sensible. Maybe, do you want to make plot a specified/uniform interface point as well, along the lines of your suggestion in #85 (and/or mine above)?
Small detail regarding your reference "mlr's print". mlr doesn't have a too good interface for pretty-printing or plotting.
It is actually the R language itself (i.e., base R) which has "print" and "plot" as designated interface points. Agreed with "print" being a strange choice of name though for pretty-printed reports - when I first saw this long long ago, I thought it might mean saving to a file, or calling an actual printer.
"report" could be "inspect" the next time we write an MLJ, but let's not change a working system.
At the moment the Plots.jl package "plot" function just about the "standard" Julia interface point for plotting, although the future is not clear to me and others may have a better crystal ball.
Plots.jl is a front end for plotting and, at present, most of the backends are still wrapped C/Python/Java code. It is a notorious nuisance to load and execute first time. However, there is a "PlotsBase" (called PlotRecipes) which allows you to import the "plot" function you overload in your application, without loading Plots or a backend (until you need it).
... we could factor out in a MLJplots module, thus solving the dependency issue? I come starting to appreciate how Julia's dispatch philosophy makes this easy (though its package management functionality could be improved).
No, no. This is not necessary. We only need PlotsBase (lightweight) as a dependency. The user does need to manually load Plots.jl if they want to plot, but I don't think that's a big deal. The backends get lazy-loaded (ie, as needed).
@fkiraly and others. Returning to your original comment opening this thread, where should one-class classification fit into our scheme? Unsupervised, yes?
In terms of taxonomy, I'd consider that something completely different, i.e., neither supervised nor unsupervised.
I'd consider one-class classifiers (including one-class kernel SVM) as an instance of outlier detectors, or anomaly detectors (if also on-line).
Even in the case where labelled outliers/artefacts/anomalies are provided in the training set, it's different from the (semi-)supervised task, since there is a designated "normal" class.
It's also different from unsupervised, since unsupervised methods have no interface point to feed back "this is an anomaly".
I.e., naturally, the one-class-SVM would have a task-specific fit/detect interface (or similar, I'm not too insistent on naming here).
One could also consider it sitting in the wider class of "annotator" tasks.
Does this mean the type hierarchy is not granular enough. Maybe it should be traits
@datnamer, that's an interesting question for @ablaom - where do we draw the distinction between type and trait?
If I recall an earlier discussion correctly, whenever we need to dispatch or inherit differently?
It's just a feeling, but I think anomaly detectors and (un)supervised learners should be different - you can use the latter to do the former, so if feels more like a wrapper/reduction rather than trait variation.
Some coarse distinctions are realised in a type hierarchy. From the docs:
The ultimate supertype of all models is MLJBase.Model
, which
has two abstract subtypes:
abstract type Supervised <: Model end
abstract type Unsupervised <: Model end
Supervised
models are further divided according to whether they are
able to furnish probabilistic predictions of the target (which they
will then do so by default) or directly predict "point" estimates, for each
new input pattern:
abstract type Probabilistic <: Supervised end
abstract type Deterministic <: Supervised end
All further distinctions are realised with traits some of which take values in the scitype hierarchy or in types derived from them. An example of such a trait is target_scitype_union
.
So, I suppose we create a new abstract subtype of MLJ.Model
, called AnomalyDetection
? With a predict
method that only predicts Bool
? Or only predicts objects of scitype Finite{2}
(a CategoricalValue{Bool})? With the same traits delineating input scitype types that we have for Unsupervised models, yes?
Obviously this not a priority right now but it did recently come up.
@ablaom regarding AnomalyDetection
agreed, though I'd just call it detect
rather than predict
.
Regarding unsupervised learners: have we progressed about the distinction between (i) and (ii) at least, from the first post? For #161 especially, a "transformer" type (or sub-type? aspect?) as in (i) would be necessary.
Update: actually, I think we will be fine with (i), i.e., transformer style behaviour only for ManifoldLearning.jl in #161.
Regarding unsupervised models such as PCA, kmeans, etc discussed in #44.
I know these are commonly encapsulated within the transformer formalism, but it would do the methodology behind them injustice as feature extraction is only one major usage cases of unsupervised models. More precisely, there are, as far as I can see, three use cases:
(i) feature extraction. For clusterers, create a column with cluster assignment. For continuous dimension reducers, create multiple continuous columns.
(ii) model structure inference - essentially, inspection of the fitted parameters. E.g., PCA components and loadings. Cluster separation metrics etc. These may be of interest in isolation, or used as an (hyper-parameter) input of other atomic models in a learning pipeline.
(iii) full probabilistic modelling aka density estimation. This behaves as a probabilistic multivariate regressor/classifier on the input variables.
For the start if makes sense to implement only "transformer" functionality, but it is maybe good to keep in mind for implementation that eventually one may like to expose the other outputs via interfaces. E.g., the estimated multivariate density in a fully probabilistic implementation of k-means.