csinva / imodels

Interpretable ML package šŸ” for concise, transparent, and accurate predictive modeling (sklearn-compatible).
https://csinva.io/imodels
MIT License
1.36k stars 121 forks source link

Sample Weight Support? #89

Open kmedved opened 2 years ago

kmedved commented 2 years ago

Hello - thanks all for the very interesting looking package. The hierarchical shrinkage wrapper seems especially interesting/novel. I'm interested in whether it would be possible to add sample weight support to this package? For background, sample weights are a fairly typical part of many scikit-learn packages (e.g., RandomForestRegressor or HistGradientBoostingRegressor, etc...), and are passed via the fit call, e.g., model.fit(X_train, y_train, sample_weight = w_train).

The purpose of sample weights is to increase the weighting of rows/observations based on some external criteria, typically based around how the training data was gathered, e.g., if your data has different sensors of varying sensitivity, you may increase the sample weighting of certain sensors. Or alternatively if your data is aggregated in some form, then you can increase the weights based on the aggregation (e.g., weekly data with a weight of 7, daily data with a weight of 1, etc...).

In terms of implementation, it's typically as simple as multiplying the loss for each row by the sample weights, to increase the model's sensitivity to large weightings, although I'm not sure if the novel hierarchical shrinkage capabilities of this package would present complications.

Thanks again for the very interesting looking package. I look forward to testing and using it.

csinva commented 2 years ago

Hi @kmedved šŸ‘‹, thanks for your interest in the package! Indeed, supporting sample weight seems like it would be useful and especially interesting for hierarchical shrinkage - we'll add it in some time very soon :)

csinva commented 2 years ago

An update: some of the models (but not all) now support sample_weight including FIGS, TAO, SLIM, CART, BoostedRules, SLIPPER, and SkopeRules. Still working on the others...

mepland commented 1 year ago

Some parts of FIGS do not support sample_weight including the extract_sklearn_tree_from_figs() function.

kmedved commented 1 year ago

Thanks for the work on this @csinva. Any update on getting sample weight supported added for hierarchical shrinkage?

csinva commented 1 year ago

@aagarwal1996 @yanshuotan Can someone add in sample-weight support for HS?

yanshuotan commented 1 year ago

Actually HS already supports sample weights. sample_weight is fed into self.estimator_.fit() as an element of kwargs. For instance, see the following snippet:

Screen Shot 2023-01-01 at 1 58 23 PM st))`

Furthermore, line 84 of the code uses weighted_n_node_samples to do shrinkage. When the original tree estimator is fit, it stores the weighted number of nodes in this array.

I do agree that it may be beneficial to make sample_weight an explicit (optional) argument into fit. @csinva what do you think?

csinva commented 1 year ago

Agreed, thanks Yan Shuo for adding HS sample_weight as an explicit argument in https://github.com/csinva/imodels/pull/156.

Should work now @kmedved!