aldro61 / mmit

Regression trees for interval censored output data
https://aldro61.github.io/mmit/
GNU General Public License v3.0
7 stars 7 forks source link

A scikit-learn inspired class for grid search cross-validation #7

Closed aldro61 closed 7 years ago

aldro61 commented 7 years ago

With minimum cost-complexity pruning built-in.

Had to write a custom class, since it was not possible to integrate minimum cost-complexity pruning in the scikit learn GridSearchCV. Eventually, we could write a more general class that accepts any estimator (e.g.: a sklearn pipeline) and calls pruning functions internally if required. We don't need this for now.

aldro61 commented 7 years ago

This concludes the implementation of minimum cost complexity pruning.

tdhock commented 7 years ago

Where is the code the implements the CV evaluation metric? (the number you optimize in order to select the hyper-parameters) Can it be specified at training time?

tdhock commented 7 years ago

if we can specify at training time then it would be nice to compare the models in terms of test error on the benchmark data sets. e.g. is it more accurate to select the hyper-parameters that minimize the number of incorrectly predicted intervals, or the hinge loss, or AUC, etc?

aldro61 commented 7 years ago

If the "scoring" argument is not specified, it uses the score function of the estimator. If scoring is specified, it uses this function instead. In this case, any function scoring(X_test, y_test) -> R can be used, so yes, we could evaluate many different metrics.

The AUC is not implemented yet. Right now, the score function of MMIT returns the mean squared error with respect to the closest bound if the predicted value isn't in the target interval.

tdhock commented 7 years ago

that sounds like a reasonable default cv metric. I like how it is independent of margin size.

aldro61 commented 7 years ago

@tdhock This is what I explained to you today. Ok to merge?