Open juliuscoburger opened 2 months ago
It looks like the floating-point error starts to creep in, from two sources:
base_scores
The check passes if you relax the required tolerance:
np.testing.assert_almost_equal(treelite.gtil.predict(tl_model, data=X).squeeze(), bst.predict(dtrain), decimal=2)
The check is just there to showcase that the scores are not equal. I was under the impression that GTIL always returns the same scores.
Can it be that this value is not being translated?
I double checked and base_scores
is being properly translated and handled. So the error is not due to logic error.
I was under the impression that GTIL always returns the same scores.
GTIL may evaluate trees and leaf nodes in different order as XGBoost. Addition of floating-point values is not associative (a + (b + c) != (a+b) +c
in general), and error may accumulate, especially if some values in the sum are much larger than the others, like in this example.
To minimize error due to floating-point arithmetic, consider scaling the target by using StandardScaler
.
I'll probably have to add a note to the documentation for GTIL about possibility of floating-point error and how to mitigate it.
I noticed that my model returns different scores than the original model. I was able to boil the issue down to using a
base_score
during training. Can it be that this value is not being translated?Code to replicate the issue: