Closed AhmetZamanis closed 8 months ago
Hi, I'm not sure how I can help with this error. XGBoost simply checks the DataFrame.dtypes
parameter as it is, if some outer libraries or procedures are creating invalid types, there's not much we can do inside XGBoost.
I see, but the datatypes in DataFrame.dtypes
seem to be correct after the transformation, so I thought the issue may arise from XGBoost. I've also used older versions of XGBoost with data transformed by category_encoders
without issue.
I'll keep tinkering and notify if I find out more.
Very sorry to waste your time, there is no issue, everything works as expected. I just made a very silly typo in my data splitting code, and realized it very late.
Feel free to delete this thread if it's possible, I couldn't figure out how.
can u please share what mistake exactly did you make? I am also facing the same issue
@Gandharv29 I made an error in splitting the features and the target:
y_train, y_val, y_test = y[:train_end], X[train_end:val_end], X[val_end:]
Because of this, I was unknowingly trying to pass the unprocessed features as the target vector, and correctly getting the datatype error. I doubt you have the same issue.
Issue
I am trying to fit an
XGBRegressor
model with early stopping & aneval_set
in a Jupyter notebook. The training & validation data are Pandas DataFrames, and all columns are offloat
orint
datatype. I still get the following error upon runningXGBRegressor.fit()
:The four mentioned columns are originally of object datatype, but they are encoded & converted to float with
TargetEncoder
from packagecategory_encoders
before model training. I suspect some sort of leftover "metadata" is causing XGBoost to still interpret them as object type columns.I have tried the following workarounds, which all result in the same error:
DataFrame.values
,pd.DataFrame(Dataframe.values, DataFrame.columns, Dataframe.index)
,I will share my environment & versions, code snippets & the full traceback below. I don't think I'm allowed to share the dataset, so the full notebook may not be additionally useful. Please let me know if I can help further.
Environment info
Jupyter versions:
Relevant code & traceback
I'm including only the code snippets that I think are relevant from my notebook. Basically, the steps are:
DataFrame
,Pipeline
of twoTargetEncoders
,XGBRegressor
is created & fit as part of an Optuna tuning objective. The model fitting yields the error & traceback.Below is the code snippet & traceback for creating & fitting the model. It is part of an Optuna objective function, but I'm omitting the Optuna code as I don't think it's relevant.