Closed aldro61 closed 7 years ago
This concludes the implementation of minimum cost complexity pruning.
Where is the code the implements the CV evaluation metric? (the number you optimize in order to select the hyper-parameters) Can it be specified at training time?
if we can specify at training time then it would be nice to compare the models in terms of test error on the benchmark data sets. e.g. is it more accurate to select the hyper-parameters that minimize the number of incorrectly predicted intervals, or the hinge loss, or AUC, etc?
If the "scoring" argument is not specified, it uses the score function of the estimator. If scoring is specified, it uses this function instead. In this case, any function scoring(X_test, y_test) -> R can be used, so yes, we could evaluate many different metrics.
The AUC is not implemented yet. Right now, the score function of MMIT returns the mean squared error with respect to the closest bound if the predicted value isn't in the target interval.
that sounds like a reasonable default cv metric. I like how it is independent of margin size.
@tdhock This is what I explained to you today. Ok to merge?
With minimum cost-complexity pruning built-in.
Had to write a custom class, since it was not possible to integrate minimum cost-complexity pruning in the scikit learn GridSearchCV. Eventually, we could write a more general class that accepts any estimator (e.g.: a sklearn pipeline) and calls pruning functions internally if required. We don't need this for now.