Closed lbventura closed 1 year ago
Yeah, we should not do the calculation in each iteration.
As quick implementation, you can go with your proposed alternative, it's not too expensive.
But in fact, we might also do a refactoring, all the unbinding and re-binding in the middle of the fitting is not needed, I think.
Cool, I'll will follow your suggestion then and try to remove the feature.unbind_data
which is currently preventing the computation outside of the loop.
Just removed unbind_data
and refactored.
It would be nice to include the estimation of feature importances for one of the integration tests, just as sanity check to see if the method produces reasonable results.
It would be nice to include the estimation of feature importances for one of the integration tests, just as sanity check to see if the method produces reasonable results.
I am not 100% sure if this is what you are referring to: https://github.com/Blue-Yonder-OSS/cyclic-boosting/pull/41/files#diff-d8c7322bad71404522a489cb47c789fd4e6ec02b09f93563b51c1d599d4d9707R7
See screenshot below with how the results of the test look like: one can identify the features through the FeatureID.feature_group
attribute, but this is not super clear as it does not provide information about the original feature name (say, day_of_week
). Maybe we can edit FeatureID
so that it includes the original feature name?
I mean to use the data in the integration test, because it is a quite realistic simulation and we expect the most informative features (which we know for this) to have highest feature importance. To your issue with the feature name, I think you already have access to this. There is just no feature names for the simple test you used. In the integration tests there should be names.
I mean to use the data in the integration test, because it is a quite realistic simulation and we expect the most informative features (which we know for this) to have highest feature importance. To your issue with the feature name, I think you already have access to this. There is just no feature names for the simple test you used. In the integration tests there should be names.
Both issues have been addressed, I believe
By including
set_feature_bin_weighted_average
in CyclicBoostingBasecb_features
. I think this is a rather inefficient way of doing this as the feature importances get computed in each iteration of_fit_main
(https://github.com/Blue-Yonder-OSS/cyclic-boosting/blob/main/cyclic_boosting/base.py#L691), which is redundant as only the final feature importances are required. It was done because:set_feature_bin_weighted_average
has to be called beforefeature.unbind_data
feature.bind_data(X, weights, False)
Alternatively, one could iterate over the features and 1)
bind_data
, 2) compute the weighted average, and then 3)unbind_data
, for example, here https://github.com/Blue-Yonder-OSS/cyclic-boosting/blob/main/cyclic_boosting/base.py#L724