Open kmedved opened 2 years ago
Hi @kmedved š, thanks for your interest in the package! Indeed, supporting sample weight seems like it would be useful and especially interesting for hierarchical shrinkage - we'll add it in some time very soon :)
An update: some of the models (but not all) now support sample_weight
including FIGS, TAO, SLIM, CART, BoostedRules, SLIPPER, and SkopeRules. Still working on the others...
Some parts of FIGS do not support sample_weight
including the extract_sklearn_tree_from_figs()
function.
Thanks for the work on this @csinva. Any update on getting sample weight supported added for hierarchical shrinkage?
@aagarwal1996 @yanshuotan Can someone add in sample-weight support for HS?
Actually HS already supports sample weights. sample_weight
is fed into self.estimator_.fit()
as an element of kwargs
. For instance, see the following snippet:
st))`
Furthermore, line 84 of the code uses weighted_n_node_samples
to do shrinkage. When the original tree estimator is fit, it stores the weighted number of nodes in this array.
I do agree that it may be beneficial to make sample_weight
an explicit (optional) argument into fit. @csinva what do you think?
Agreed, thanks Yan Shuo for adding HS sample_weight
as an explicit argument in https://github.com/csinva/imodels/pull/156.
Should work now @kmedved!
Hello - thanks all for the very interesting looking package. The hierarchical shrinkage wrapper seems especially interesting/novel. I'm interested in whether it would be possible to add sample weight support to this package? For background, sample weights are a fairly typical part of many scikit-learn packages (e.g.,
RandomForestRegressor
orHistGradientBoostingRegressor
, etc...), and are passed via the fit call, e.g.,model.fit(X_train, y_train, sample_weight = w_train)
.The purpose of sample weights is to increase the weighting of rows/observations based on some external criteria, typically based around how the training data was gathered, e.g., if your data has different sensors of varying sensitivity, you may increase the sample weighting of certain sensors. Or alternatively if your data is aggregated in some form, then you can increase the weights based on the aggregation (e.g., weekly data with a weight of 7, daily data with a weight of 1, etc...).
In terms of implementation, it's typically as simple as multiplying the loss for each row by the sample weights, to increase the model's sensitivity to large weightings, although I'm not sure if the novel hierarchical shrinkage capabilities of this package would present complications.
Thanks again for the very interesting looking package. I look forward to testing and using it.