Closed fredrikluo closed 3 years ago
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
misc.go | 0 | 6 | 0.0% | ||
xgblinear.go | 0 | 6 | 0.0% | ||
xgensemble_io.go | 10 | 16 | 62.5% | ||
lgensemble.go | 13 | 20 | 65.0% | ||
transformation/leaf_index.go | 6 | 16 | 37.5% | ||
leaves.go | 19 | 33 | 57.58% | ||
xgensemble.go | 9 | 26 | 34.62% | ||
<!-- | Total: | 63 | 129 | 48.84% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
xgensemble.go | 1 | 58.49% | ||
<!-- | Total: | 1 | --> |
Totals | |
---|---|
Change from base Build 159: | -0.5% |
Covered Lines: | 1862 |
Relevant Lines: | 2731 |
cc: @dmitryikh
Hi! Thanks for your contribution and good job!
This feature definitely should be landed in leaves
. But I have few concerns:
func (e *Ensemble) Predict(fvals []float64, nEstimators int, predictions []float64, predleaf bool) ([][]uint32, error) {
Signature becomes error prone. The rule "don't pay for what you don't use" is violated here.
func (e *Ensemble) Predict(fvals []float64, nEstimators int, predictions []float64) error {
But when transformation = LeafIndexes, then predictions
array populated with leaf indices. I know there is type problem: index is not float64, but float64 seems like needed compromise here. User code can convert it back to int32 without any loss in precision.
What do you think about it?
Yes, to use transformation seems to be a better idea.
The only thing that the client code needs to be aware of is that the length of predictions and length of predictLeafIndices are different, the latter needs to be dimension * nEstimator. But we can check it in our code, so shouldn't be a big deal.
for 1. I think I have checked, however, it has been for a while, I can double-check again.
@fredrikluo , if you don't mind I will try to fix this PR based on points above. In particular I want to try to use transformation mechanism to support leaf indices.
Absolutely, please go ahead
@fredrikluo, can you please check the checkbox "Allow edits by maintainers" on the right. Thanks!
Checked
@fredrikluo , could you please check that current implementation is suitable for your case? If ok - i will merge.
The enter point to work with leaf indices:
// EnsembleWithLeafPredictions returns ensemble instance with TransformLeafIndex
// (return trees indices instead of numerical values)
func (e *Ensemble) EnsembleWithLeafPredictions() *Ensemble {
// each predictions will produce NRawOutputGroups() * NEstimators() values
return &Ensemble{e, &transformation.TransformLeafIndex{e.NRawOutputGroups() * e.NEstimators()}}
}
Yes, I tested in my code base, everything works fine, this looks awesome!
Thank you for your contribution! Merged.
Thanks to the popular paper https://research.fb.com/wp-content/uploads/2016/11/practical-lessons-from-predicting-clicks-on-ads-at-facebook.pdf
Many people use GBDT to extract features from a dataset and then instead of predicting results directly. The extracted features are the leaf indices from each estimator which makes the decision.
This is achieved by setting predleaf with lightGBM https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.predict
I am adding the same support in this pull request. Basically, the user can set this parameter and then predict function will return the leaf indices.
The test data are generated by lightGBM to make sure that the indices are generated in the same way.