dmlc / tl2cgen

TL2cgen (TreeLite 2 C GENerator) is a model compiler for decision tree models
https://tl2cgen.readthedocs.io/en/latest/
Apache License 2.0
17 stars 5 forks source link

[FEA] Recognize pandas_categorical field in LightGBM model #10

Open zyxue opened 2 years ago

zyxue commented 2 years ago

Reproduce

X = (
    pd.DataFrame(
        np.array(30 * [["a", 1]] + 30 * [["b", 2]] + 30 * [["c", 3]]),
        columns=["x1", "x2"],
    )
    .assign(x1=lambda df: df["x1"].astype("category"))
    .assign(x2=lambda df: df["x2"].astype(float))
)
y = np.array(60 * [5] + 30 * [10])

X.dtypes

bst = lgb.LGBMRegressor(n_estimators=5)
bst.fit(X, y)

bst.predict(X.head(2))
bst.booster_.save_model("model.txt")

model = treelite.Model.load('model.txt', model_format='lightgbm')
model.export_lib(toolchain="gcc", libpath='./mymodel.so', verbose=True)
predictor = treelite_runtime.Predictor('./mymodel.so', verbose=True)

the above all works, but not sure how to use predictor, as

dmat = treelite_runtime.DMatrix(X.head(2).to_numpy())
predictor.predict(dmat) 

leads to error like

     41         return 'float64'
     42     else:
---> 43         raise ValueError(f'Unrecognized NumPy type: {type_info}')
     44 
     45 

ValueError: Unrecognized NumPy type: object

The model file is like model.txt

at the end of model.txt, it has

...

end of parameters

pandas_categorical:[["a", "b", "c"]]

My guess is that treelite doesn't make use of the pandas_categorical line yet, so trying to confirm.

Versions:

lightgbm==3.1.1
treelite==2.2.0
treelite-runtime==2.2.0
hcho3 commented 2 years ago

My guess is that treelite doesn't make use of the pandas_categorical line yet

You are right, Treelite does not yet recognize pandas_categorical field yet. I will mark this as a feature request.