Flaky e2e test for the XGBoost model with the 'gblinear' booster

izeigerman commented 4 years ago

Context from @StrikerRUS:

Now Go is failing (refer to https://github.com/BayesWitnesses/m2cgen/pull/200#issuecomment-624063683):


=================================== FAILURES ===================================
_ test_e2e[xgboost_XGBClassifier - go_lang - train_model_classification_binary2] _
estimator = XGBClassifier(base_score=0.6, booster='gblinear', colsample_bylevel=None,
colsample_bynode=None, colsamp...ambda=0, scale_pos_weight=1, subsample=None,
tree_method=None, validate_parameters=False, verbosity=None)
executor_cls = <class 'tests.e2e.executors.go.GoExecutor'>

...

expected=[0.04761511 0.9523849 ], actual=[0.047615, 0.952385] expected=[0.06296992 0.9370301 ], actual=[0.06297, 0.93703] expected=[0.12447995 0.87552005], actual=[0.124479, 0.875521] expected=[0.0757848 0.9242152], actual=[0.075784, 0.924216] expected=[0.8092151 0.19078489], actual=[0.809212, 0.190788]


BTW, in attempts to check my guess from https://github.com/BayesWitnesses/m2cgen/pull/200#issuecomment-624067250, I found that coefs in gblinear are also float32:
https://github.com/dmlc/xgboost/blob/67d267f9da3b15a6e5a8393afae9be921a4e224b/src/gbm/gblinear_model.h#L110

https://github.com/dmlc/xgboost/blob/67d267f9da3b15a6e5a8393afae9be921a4e224b/src/gbm/gblinear_model.h#L120

https://github.com/dmlc/xgboost/blob/67d267f9da3b15a6e5a8393afae9be921a4e224b/src/gbm/gblinear_model.h#L82

https://github.com/dmlc/xgboost/blob/67d267f9da3b15a6e5a8393afae9be921a4e224b/src/gbm/gblinear_model.h#L91

and from #188 (comment) we know that bst_float is actually float
https://github.com/dmlc/xgboost/blob/8d06878bf9b778db68ae98f68d99a3557c7ea885/include/xgboost/base.h#L110-L111

Created https://github.com/dmlc/xgboost/issues/5634.

izeigerman commented 4 years ago

Duplicating my question here: @StrikerRUS excellent observation 👍 Do you think we can pull off the same trick as we did for thresholds? Basically cast weights and coefs to float32 on our end. Or is it different this time?

StrikerRUS commented 4 years ago

I think we can wait for a while for any reply in https://github.com/dmlc/xgboost/issues/5634. Let say two or three days, and only then take any actions.

StrikerRUS commented 4 years ago

For the record. Two new cases (both Powershell):

expected=[0.04631048 0.9536895 ], actual=[0.0463101577391968, 0.953689842260803]
expected=[0.06417668 0.9358233 ], actual=[0.064176231780856, 0.935823768219144]
expected=[0.12548977 0.8745102 ], actual=[0.125489171473375, 0.874510828526625]
expected=[0.07494283 0.9250572 ], actual=[0.0749422009903284, 0.925057799009672]
expected=[0.80788165 0.19211833], actual=[0.807878724641367, 0.192121275358633]

expected=[0.04594815 0.95405185], actual=[0.0459477380325012, 0.954052261967499]
expected=[0.06591994 0.93408006], actual=[0.0659193724496036, 0.934080627550396]
expected=[0.13377541 0.8662246 ], actual=[0.133774447739148, 0.866225552260852]
expected=[0.07744116 0.92255884], actual=[0.0774401457484807, 0.922559854251519]
expected=[0.79560804 0.20439194], actual=[0.795604680327838, 0.204395319672162]

Side question: I wonder, why values are so non-deterministic? Did we forget to set random_state somewhere?..

izeigerman commented 4 years ago

Hm, good question. I briefly look into the code and couldn't spot any obvious sources of non-determinism.

StrikerRUS commented 4 years ago

Probably we've encountered this issue https://github.com/dmlc/xgboost/issues/5298#issuecomment-589528460 and #188 was not enough to fix issues like we have in generated examples: https://github.com/BayesWitnesses/m2cgen/blob/4869e002373fa53f96b62f92eee13b914586fc13/generated_code_examples/python/regression/lightgbm.py#L8 https://github.com/BayesWitnesses/m2cgen/blob/4869e002373fa53f96b62f92eee13b914586fc13/generated_code_examples/python/regression/lightgbm.py#L13 https://github.com/BayesWitnesses/m2cgen/blob/4869e002373fa53f96b62f92eee13b914586fc13/generated_code_examples/haskell/regression/random_forest.hs#L15

But anyway, we should get a reliable repro first.

StrikerRUS commented 4 years ago

Have no time to read this right now, but looks like related to our issue: https://github.com/numpy/numpy/pull/9941.

BayesWitnesses / m2cgen

Flaky e2e test for the XGBoost model with the 'gblinear' booster #205