Open kumarsameer opened 1 year ago
Also compiled xgboost c-api. The margin output matches correctly.
Have verified the column names in xgboost dump against treelite's main.c file, and the column names matches correctly.
Any idea what could be the issue is greatly appreciated.
@hcho3 I have looked at similar issue you commented on, and tried everything. Possible for you to throw some light here.
Am using the following package versions :
treelite==3.1.0.dev0
treelite-runtime==3.1.0
xgboost==1.7.3
@kumarsameer Can you post your XGBoost model here? I'll try to debug the issue
please find the replication example and modelfile(zip) attached.
import xgboost as xgb
import treelite
import treelite_runtime
import numpy as np
test_data = [-1.62e+02, 3.63e+01, 1.00e+01, 6.00e+00, 1.10e+01, 7.00e+00,
1.00e+00, 3.00e+00, 0.00e+00, 2.00e+00, -1.00e+00, 3.30e+01,
7.70e+01, 3.90e+01, 1.11e+01, 1.30e+01, -9.00e+00, 0.00e+00,
-6.80e+01, 5.40e+01, 1.09e+02, -7.00e-01, -7.00e-01, -3.00e+00,
9.10e+01, -9.00e+00, -2.00e+00, 1.10e+01, 9.00e+00, 1.50e+01,
-1.20e+01, -1.80e+01, -6.20e+01, 6.55e+01, 4.50e+01, 5.90e+01,
1.00e-01, -2.00e+00, 3.00e+00, 1.80e+01, -6.00e+00, -1.80e+01,
-1.00e+00, 1.00e-01, -2.40e+00, 0.00e+00, 0.00e+00, 0.00e+00,
0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00,
0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00,
0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00, 0.00e+00]
modelfile = 'model9.bin'
model = treelite.Model.load(modelfile, model_format='xgboost')
model.export_lib('gcc', 'compiled.dylib', params={'parallel_comp': model.num_tree}, verbose=False)
predictor = treelite_runtime.Predictor(libpath='./compiled.dylib')
dmat = treelite_runtime.DMatrix(test_data)
print(f'Treelite: {predictor.predict(dmat,pred_margin=True)}')
bst = xgb.Booster()
bst.load_model('model9.bin')
dtrain = xgb.DMatrix(data=np.expand_dims(test_data,0))
print("bin :",bst.predict(dtrain,output_margin=True))
This is the output I get on my machine
Treelite: -0.3277367353439331
bin : [0.5290272]
The output matches if
test_data=list(np.ones(65))
@hcho3 let me know if you need any help in replicating the issue ?
Not sure if it helps, but one of the things i noticed is that the c++ xgboost api also matched only after i set the missing
to std::numeric_limits<double>::quiet_NaN()
Hi y'all,
Seeing the same issue here. Attaching a CSV of data and labels. The predictions for the first twenty rows are below - XGB at left, Treelite at right. I'm running Treelite version 3.1.0. I attached the XGBoost JSON and generated C code in the zip file.
models.zip train_labels.csv train_data.csv
0.0 0.15592413
0.0 0.7200409
0.0 0.7200409
0.0 0.9868025
1.0 0.9695979
0.0 0.9695979
0.0 0.90683216
1.0 0.90683216
1.0 0.25082284
1.0 0.6561131
1.0 0.6561131
1.0 0.33845404
1.0 0.33845404
1.0 0.33845404
1.0 0.7200409
1.0 0.0045186426
The model was built in XGBoost with
param = {'max_depth': 6,
'eta': 0.3,
'tree_method': 'hist',
'objective': 'binary:hinge',
'eval_metric': ['logloss', 'error']}
And here's full replication code
import pandas as pd, numpy as np, treelite, treelite_runtime, xgboost as xgb
from importlib import reload
reload(treelite_runtime)
reload(treelite)
X = pd.read_csv('train_data.csv').to_numpy()[:,1:]
y = pd.read_csv('train_labels.csv').to_numpy()[:,1]
dtrain = xgb.DMatrix(X, label=y)
param = {'max_depth': 6, 'eta': 0.3, 'tree_method': 'hist', 'objective': 'binary:hinge', 'eval_metric':['logloss', 'error']}
bst = xgb.train(param, dtrain, 10, [(dtrain, 'train')])
model = treelite.Model.from_xgboost(bst)
model.export_lib(toolchain='gcc', libpath='./mymodel.so', verbose=True)
preds = bst.predict(xgb.DMatrix(X[0:20,:]))
predictor = treelite_runtime.Predictor('./mymodel.so')
# these should match
for i in range(20): print(preds[i], predictor.predict(treelite_runtime.DMatrix(X[i:i+1,:])))
I'm facing the same issue. Did you found any walkaround?
output:
Every alternate margin output it matching.