booster.trees_to_dataframe crashes when there are boolean feature_types 'i'

I think that I have found a bug with XGBoost. Suppose that I train a XGBoost model then take the booster via booster=model.get_booster() and that this booster has in its feature_types a 'i' (i.e. a boolean variable). Then the method booster.trees_to_dataframe() will crash and give this error: "ValueError: Failed to parse model text dump.".

How I came up with the bug: I recently updated a bunch of Python libraries (especially Pandas 1.5.1 to 2.2.2) and my script which used to worked now crashes at this step booster.trees_to_dataframe(). I looked in the source code at https://github.com/dmlc/xgboost in the file python-package/xgboost/core.py for the method trees_to_dataframe. The issue is in this part of the code:

if fid[0].find("<") != -1: ...
elif fid[0].find(":{") != -1: ...
else: raise ValueError("Failed to parse model text dump.")

The problem is that for a feature_type 'i' there's no "<" or ":{" to find in the string so it ends in the "else" part that raises a ValueError.

I found two ways to avoid the error on my side:

Explicitly cast Pandas DataFrame boolean columns as np.uint8 instead of bool (before, np.uint8 was the default resulting dtype for pd.get_dummies and now it changed to bool, hence why I now get the bug).
Before using booster.trees_to_dataframe(), cast the booster types as integers instead of booleans via booster.feature_types = ['int' if feature_type == 'i' else feature_type for feature_type in booster.feature_types].

In the long term I think that booster.trees_to_dataframe() should not crash with a booster where there is a feature type 'i'.

Here is a sample script to see that trees_to_dataframe crashes when there's a feature type 'i' and that the two suggested techniques do avoid the crash:

# Python 3.12.4, macOS 14.5
# Import a toy dataset
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer()
# Convert it to Pandas
import pandas as pd # 2.2.2
df_X = pd.DataFrame(data=dataset.data,columns=dataset.feature_names)
s_y = pd.Series(data=dataset.target,name='target')
# Convert the features to boolean
df_X = df_X>df_X.quantile(axis=0,q=0.5).values.reshape(1,-1)
# One way to avoid trees_to_dataframe to crash is to convert the Pandas booleans to integers
do_temporary_solution1=False # try setting it to True
if do_temporary_solution1:
    df_X = df_X.astype(int) # One can use this to avoid trees_to_dataframe to crash 
# Take a XGBoost model
from xgboost import XGBClassifier # 2.0.3
model = XGBClassifier(random_state=6*7)
# Train the model
model.fit(X=df_X,y=s_y)
# Take the booster
booster = model.get_booster()
# Look at the feature_types
print(booster.feature_types) # 30*['i']
# Another way to avoid trees_to_dataframe to crash is to convert the types 'i' to 'int'
do_temporary_solution2=False # try setting it to True
if do_temporary_solution2:
    booster.feature_types = ['int' if feature_type == 'i' else feature_type for feature_type in booster.feature_types]
# Convert to a DataFrame
df_booster = booster.trees_to_dataframe() # ValueError: Failed to parse model text dump.
print(df_booster)

dmlc / xgboost

booster.trees_to_dataframe crashes when there are boolean feature_types 'i' #10437