Parity between c++ xgboost and xgboost-predictor-java

Peter-Devine commented 4 years ago

Hi -

Have you done any parity tests between the scored output of the c++ models and the java models?

Asking because I'm seeing large differences (greater than 1) when using double precision values when doing regression.

Using these training parameters:

    "eta" -> 0.3,
    "max_depth" -> 2,
    "objective" -> "reg:linear",
    "early_stopping_rounds" ->2,
    "num_round" -> 15,
    "nworkers" -> 2

When I cast the Doubles in the FVec to a Float first, I then see the results are much closer, to within a .0001 tolerance.

Peter-Devine commented 4 years ago

Hi,

We also noted this tolerance. Is there any way to improve it?

Thank you!!!

Peter-Devine commented 4 years ago

I would recommend to always be using features as floats. XGBoost is explicit that it treats things as 32 bit due to performance optimizations (one example https://github.com/dmlc/xgboost/issues/1410). If a model has been trained using xgboost its split values will be stored as floats and so giving it doubles may cause inaccurate predictions if hit just the right values.

Peter-Devine / test_repo_0

Parity between c++ xgboost and xgboost-predictor-java #9