ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation
https://dalex.drwhy.ai
GNU General Public License v3.0
1.38k stars 166 forks source link

How to do PD profiles with Y/N variable? #529

Closed ChunJen closed 2 years ago

ChunJen commented 2 years ago

I tried to use the top5 important variables (with dalex variable importance) ploting PD profiles. but got the error message:

`TypeError Traceback (most recent call last) /tmp/ipykernel_35132/2656290125.py in 1 # PD profiles 2 avm_mp = avm_exp.model_profile() ----> 3 avm_mp.plot(variables = ['lvr19','lvr20'])

~/.local/lib/python3.7/site-packages/dalex/model_explanations/_aggregated_profiles/object.py in plot(self, objects, geom, variables, center, size, alpha, color, facet_ncol, facet_scales, title, y_title, horizontal_spacing, vertical_spacing, show) 237 all_variables = _global_utils.intersect_unsorted(variables, all_variables) 238 if len(all_variables) == 0: --> 239 raise TypeError("variables do not overlap with " + ''.join(variables)) 240 241 _result_df = _result_df.loc[_result_df['vname'].isin(all_variables), :]

TypeError: variables do not overlap with lvr19lvr20`

hbaniecki commented 2 years ago

What is the result of avm_exp.data.columns? Are there these variables ['lvr19','lvr20']?

I can debug this if you provide a minimal reproducible code example with code/toy dataset.

ChunJen commented 2 years ago

yes, ['lvr19','lvr20'] are including in avm_exp.data.columns .

and the result has 180 variables: ['lvr_1', 'lvr_2', 'lvr_3', 'lvr_4', 'lvr_5', 'lvr_6', 'lvr_7', 'lvr_8', 'lvr_9', 'lvr_10', 'lvr_11', 'lvr_12', 'lvr_13', 'lvr_14', 'lvr_15', 'lvr_16', 'lvr_19', 'lvr_20', 'lvr_21', 'lvr_22', 'lvr_23', 'lvr_24', 'lvr_25', 'lvr_27', 'lvr_28', 'lvr_34', 'lvr_35', 'lvr_36', 'lvr_37', 'lvr_38', 'lvr_39', 'lvr_40', 'lvr_41', 'lvr_42', 'lvr_43', 'lvr_44', 'lvr_46', 'lvr_47', 'lvr_48', 'lvr_49', 'poi_75', 'poi_72', 'poi_60', 'poi_74', 'poi_54', 'poi_58', 'poi_69', 'poi_82', 'poi_78', 'poi_67', 'poi_73', 'poi_64', 'poi_83', 'poi_61', 'poi_81', 'poi_84', 'poi_62', 'poi_79', 'poi_59', 'poi_63', 'poi_65', 'poi_71', 'poi_90', 'poi_80', 'poi_89', 'poi_85', 'poi_55', 'poi_86', 'poi_88', 'poi_56', 'poi_68', 'poi_77', 'poi_66', 'poi_87', 'poi_76', 'poi_53', 'poi_57', 'poi_70', 'poi_91', 'poi_92', 'poi_93', 'poi_94', 'poi_95', 'poi_96', 'poi_119', 'poi_116', 'poi_104', 'poi_118', 'poi_98', 'poi_102', 'poi_113', 'poi_126', 'poi_122', 'poi_111', 'poi_117', 'poi_108', 'poi_127', 'poi_105', 'poi_125', 'poi_128', 'poi_106', 'poi_123', 'poi_103', 'poi_107', 'poi_109', 'poi_115', 'poi_134', 'poi_124', 'poi_133', 'poi_129', 'poi_99', 'poi_130', 'poi_132', 'poi_100', 'poi_112', 'poi_121', 'poi_110', 'poi_131', 'poi_120', 'poi_97', 'poi_101', 'poi_114', 'area_135', 'area_136', 'area_137', 'area_138', 'economy_139', 'economy_149', 'economy_161', 'economy_140', 'economy_150', 'economy_162', 'economy_141', 'economy_151', 'economy_163', 'economy_142', 'economy_152', 'economy_164', 'economy_143', 'economy_153', 'economy_165', 'economy_144', 'economy_154', 'economy_166', 'economy_145', 'economy_155', 'economy_167', 'economy_146', 'economy_156', 'economy_168', 'economy_147', 'economy_157', 'economy_169', 'economy_148', 'economy_158', 'economy_170', 'economy_171', 'economy_172', 'empty_173', 'empty_174', 'empty_175', 'empty_176', 'house_177', 'house_178', 'house_179', 'house_185', 'house_180', 'house_186', 'house_181', 'house_187', 'house_182', 'house_188', 'house_183', 'house_189', 'house_184', 'house_190', 'arealevel_195', 'arealevel_196', 'arealevel_197', 'arealevel_198']

part of my code:

import dalex as dx
def predict_function(model, data):
    return np.exp(model.predict(data))

avm_exp = dx.Explainer(stacking, te_x, te_y, predict_function = predict_function, label = 'Stacking')

# top 10 variables including ['lvr19','lvr20']
avm_mp_variable_importance = avm_exp.model_parts()
avm_mp_variable_importance.plot(max_vars = 10)

# try PD profiles
avm_mp = avm_exp.model_profile()
avm_mp.plot(variables = ['lvr19','lvr20'])

some samples:

y | lvr_7 | lvr_19 | lvr_20 | poi_77 | poi_96 | economy_170 | empty_174 | house_182 | arealevel_198 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 12.7757 | 0.6360 | 0 | 0 | 3.2377 | 3.1282 | 22.1056 | 0 | 6.7715 | 0 13.1238 | 0.2896 | 1 | 0 | 2.7503 | 1.5410 | 22.1056 | 0 | 5.8322 | 0 30.9746 | 0.2102 | 1 | 0 | 1.1943 | 0.7305 | 22.0582 | 0 | 6.4372 | 0 55.7364 | 0.1196 | 1 | 0 | 2.2597 | 1.8203 | 22.0582 | 0 | 6.1284 | 0 9.4678 | 0.1229 | 1 | 0 | 1.1943 | 1.5410 | 22.1667 | 0 | 7.0390 | 0 14.5738 | 0.1196 | 0 | 0 | 1.1943 | 0.7305 | 22.1033 | 0 | 6.1284 | 0 11.4736 | 0.7160 | 0 | 0 | 3.3408 | 2.2597 | 22.1056 | 0 | 6.9183 | 0 13.2374 | 0.2079 | 1 | 0 | 1.1943 | 0.7305 | 22.1056 | 0 | 6.3949 | 0 12.5849 | 0.2635 | 1 | 0 | 1.5410 | 1.1943 | 22.1056 | 0 | 6.4372 | 0 19.4839 | 0.1229 | 0 | 0 | 1.5410 | 0.7305 | 22.0582 | 0 | 5.8322 | 0
hbaniecki commented 2 years ago

Shouldn't it be ["lvr_19", "lvr_20"] then?

ChunJen commented 2 years ago

It is ok with single quotation or double quotation. Since I tried both using other vaiables. It works, but error when input=["lvr_19", "lvr_20"] or ['lvr_19', 'lvr_20']

image

hbaniecki commented 2 years ago

@ChunJen you seem to use ["lvr19", "lvr20"] instead of ["lvr_19", "lvr_20"]. "_" is important.

ChunJen commented 2 years ago

Sorry, my bad... fixed now. Thank you @hbaniecki !