SelfExplainML / PiML-Toolbox

PiML (Python Interpretable Machine Learning) toolbox for model development & diagnostics
https://selfexplainml.github.io/PiML-Toolbox
Apache License 2.0
912 stars 109 forks source link

XGB2Regressor vs XGBRegressor #51

Closed srbPhy closed 2 months ago

srbPhy commented 2 months ago

Hi, it seems a model trained using XGB2Regressor is slightly different than the one obtained using regular XGBRegressor. For instance, if we run the following code, I get slightly different predictions for the test data. I am sure I am missing something, but I am unable to figure it out. Could you please help?

from piml import Experiment
from piml.models import XGB2Regressor
from xgboost import XGBRegressor

exp = Experiment(highcode_only=True)
exp.data_loader(data='BikeSharing', silent=True)
exp.data_prepare(target='cnt', task_type='regression', test_ratio=0.2, random_state=0, silent=True)

model1 = XGB2Regressor()
exp.model_train(model=model1, name='XGB2')

model2 = XGBRegressor(max_depth=2)
exp.model_train(model=model2, name='XGB2-default')

print(model1.predict(exp.get_data(test=True)[0]))
print(model2.predict(exp.get_data(test=True)[0]))
[-0.04393188  0.03837352  0.4268577  ...  0.02106261 -0.00260242
  0.34881094]
[-0.03740007  0.03996139  0.42402536 ...  0.02290548  0.0015662
  0.3511871 ]
yodiaditya commented 2 months ago

Confirmed I also have the same result

[-0.04393188  0.03837352  0.4268577  ...  0.02106261 -0.00260242
  0.34881094]
[-0.03740007  0.03996139  0.42402536 ...  0.02290548  0.0015662
  0.3511871 ]
ZebinYang commented 2 months ago

Hi @yodiaditya and @srbPhy

The results difference is due to the use of different default hyperparameters.

You would get the same results using the following codes.

from piml import Experiment
from piml.models import XGB2Regressor
from xgboost import XGBRegressor

exp = Experiment(highcode_only=True)
exp.data_loader(data='BikeSharing', silent=True)
exp.data_prepare(target='cnt', task_type='regression', test_ratio=0.2, random_state=0, silent=True)

model1 = XGB2Regressor()
exp.model_train(model=model1, name='XGB2')

params = exp.get_model("XGB2").estimator.estimator_.get_params()
model2 = XGBRegressor(**params)
exp.model_train(model=model2, name='XGB2-default')

print(model1.predict(exp.get_data(test=True)[0]))
print(model2.predict(exp.get_data(test=True)[0]))
srbPhy commented 2 months ago

Thank you very much for your quick response. That makes sense.