Closed arsine1996 closed 3 years ago
It should be able to work with categorical features. Do you have a minimal working example to reproduce your error?
yes sure, I appreciate a lot your assistance; here is the sample from my data, I run simple lightgbm model and ` Age | BusinessTravel | DailyRate | Department | DistanceFromHome | Education | EducationField | EmployeeCount | EmployeeNumber | EnvironmentSatisfaction | Gender | HourlyRate | JobInvolvement | JobLevel | JobRole | JobSatisfaction | MaritalStatus | MonthlyIncome | MonthlyRate | NumCompaniesWorked | Over18 | OverTime | PercentSalaryHike | PerformanceRating | RelationshipSatisfaction | StandardHours | StockOptionLevel | TotalWorkingYears | TrainingTimesLastYear | WorkLifeBalance | YearsAtCompany | YearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
29 | Travel_Rarely | 592 | Research & Development | 7 | 3 | Life Sciences | 1 | 1883 | 4 | Male | 59 | 3 | 1 | Laboratory Technician | 1 | Single | 2062 | 19384 | 3 | Y | No | 14 | 3 | 2 | 80 | 0 | 11 | 2 | 3 | 3 | 2 | 1 | 2 |
36 | Travel_Rarely | 884 | Sales | 1 | 4 | Life Sciences | 1 | 1585 | 2 | Female | 73 | 3 | 2 | Sales Executive | 3 | Single | 6815 | 21447 | 6 | Y | No | 13 | 3 | 1 | 80 | 0 | 15 | 5 | 3 | 1 | 0 | 0 | 0 |
34 | Travel_Rarely | 1326 | Sales | 3 | 3 | Other | 1 | 1478 | 4 |
X[X.select_dtypes(include="object").columns.tolist()] = X.select_dtypes(include="object").astype('category')
X0, X1, Y0, Y1 = train_test_split(X, Y, test_size=0.25, random_state=42)
model=LGBMClassifier(random_state=42, max_depth=2, n_estimators=200, boosting_type='dart') model.fit(X0, Y0)
dm = ce.domain_mappers.DomainMapperTabular(X0.values, feature_names=X0.columns.tolist(), contrast_names=['0','1'], seed=42) exp = ce.ContrastiveExplanation(dm, verbose=True ) ` and got following error _ValueError Traceback (most recent call last)
ContrastiveExplanation is unable to automatically infer what the categorical columns in your data are, except when the data is a Pandas Dataframe. You should either specify the names/indices of the categorical variables for a DomainMapperTabular
with the categorical_features
argument, or you can try replacing the DomainMapperTabular
with a DomainMapperPandas
(which automatically infers the feature names as well as which of them are categorical).
Thanks for the suggestion, I tried to add the cat columns but still it didn't work.
X[X.select_dtypes(include="object").columns.tolist()] = X.select_dtypes(include="object").astype('category')
X0, X1, Y0, Y1 = train_test_split(X, Y, test_size=0.25, random_state=42)
cat_cols = X.select_dtypes(include="object").columns.tolist()
encoder = category_encoders.OrdinalEncoder(cols=cat_name)
encoder.fit(X0, Y0)
X0_encoded = encoder.transform(X0)
X1_encoded = encoder.transform(X1)
model=LGBMClassifier(random_state=42, boosting_type='dart')
model.fit(X0_encoded, Y0, categorical_feature=cat_name)
sample = X0_encoded.iloc[0,:]
dm = ce.domain_mappers.DomainMapperTabular(X0_encoded.values, feature_names=X0_encoded.columns.tolist(),
contrast_names=['0','1'], seed=42, categorical_features=cat_name)
exp = ce.ContrastiveExplanation(dm, verbose=True, seed=42)
dexError Traceback (most recent call last)
The previous version had the assumption that the categorical feature names were indices (0, 1, 5, etc.) instead of names of features ('BusinessTravel', 'Department'). This should be fixed now.
cat_cols = X.select_dtypes(include=['category', 'object']).columns.tolist()
dm = ce.domain_mappers.DomainMapperTabular(X0.values,
feature_names=X0.columns.tolist(),
contrast_names=['0','1'],
seed=42,
categorical_features=cat_name)
should work as well as
dm = ce.domain_mappers.DomainMapperPandas(X0, contrast_names=['0', '1'], seed=42)
(the latter automatically infers feature names and categorical names for you)
Does the model supports categorical feature types for lgbm? I got an error when running with specified categorical features.