Closed ragrawal closed 4 years ago
I was able to fix the issue using the following code. However not sure if this the right approach or not
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from baikal import Input, Model, make_step, Step
from baikal.plot import plot_model
from baikal.steps import Stack
from catboost import CatBoostClassifier
# load data
df = pd.read_csv(
'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv',
header=None)
dataset = df.values
class CatBoostClassifierStep(Step, CatBoostClassifier):
def __init__(self, *args, name=None, n_outputs=1, **kwargs):
super().__init__(*args, name=name, n_outputs=n_outputs, **kwargs)
def __hash__(self):
return hash(super().name)
x = Input()
y = Input()
xgbStep = CatBoostClassifierStep()(x, y)
model = Model(x, xgbStep, y)
model.fit(dataset[:,0:8], dataset[:,8])
@ragrawal Thank you for the bug report!
Indeed that's a bug in Model.fit
, I'll see what I can do about it. I think I can release a fix for it in 0.4.2. In the meantime, please use that workaround you pasted which, though a bit cumbersome, is valid and seems to be the most sensible approach.
hi Alegonz, just found out that above solution doesn't work very well with serialization. If I serialize my trained model and then try to read it back, I get following errorr 'CatBoostClassifierStep' object has no attribute '_nodes'
What is the bug?
How to reproduce it?