Closed ChunJen closed 2 years ago
What is the result of avm_exp.data.columns
? Are there these variables ['lvr19','lvr20']
?
I can debug this if you provide a minimal reproducible code example with code/toy dataset.
yes, ['lvr19','lvr20']
are including in avm_exp.data.columns
.
and the result has 180 variables:
['lvr_1', 'lvr_2', 'lvr_3', 'lvr_4', 'lvr_5', 'lvr_6', 'lvr_7', 'lvr_8', 'lvr_9', 'lvr_10', 'lvr_11', 'lvr_12', 'lvr_13', 'lvr_14', 'lvr_15', 'lvr_16', 'lvr_19', 'lvr_20', 'lvr_21', 'lvr_22', 'lvr_23', 'lvr_24', 'lvr_25', 'lvr_27', 'lvr_28', 'lvr_34', 'lvr_35', 'lvr_36', 'lvr_37', 'lvr_38', 'lvr_39', 'lvr_40', 'lvr_41', 'lvr_42', 'lvr_43', 'lvr_44', 'lvr_46', 'lvr_47', 'lvr_48', 'lvr_49', 'poi_75', 'poi_72', 'poi_60', 'poi_74', 'poi_54', 'poi_58', 'poi_69', 'poi_82', 'poi_78', 'poi_67', 'poi_73', 'poi_64', 'poi_83', 'poi_61', 'poi_81', 'poi_84', 'poi_62', 'poi_79', 'poi_59', 'poi_63', 'poi_65', 'poi_71', 'poi_90', 'poi_80', 'poi_89', 'poi_85', 'poi_55', 'poi_86', 'poi_88', 'poi_56', 'poi_68', 'poi_77', 'poi_66', 'poi_87', 'poi_76', 'poi_53', 'poi_57', 'poi_70', 'poi_91', 'poi_92', 'poi_93', 'poi_94', 'poi_95', 'poi_96', 'poi_119', 'poi_116', 'poi_104', 'poi_118', 'poi_98', 'poi_102', 'poi_113', 'poi_126', 'poi_122', 'poi_111', 'poi_117', 'poi_108', 'poi_127', 'poi_105', 'poi_125', 'poi_128', 'poi_106', 'poi_123', 'poi_103', 'poi_107', 'poi_109', 'poi_115', 'poi_134', 'poi_124', 'poi_133', 'poi_129', 'poi_99', 'poi_130', 'poi_132', 'poi_100', 'poi_112', 'poi_121', 'poi_110', 'poi_131', 'poi_120', 'poi_97', 'poi_101', 'poi_114', 'area_135', 'area_136', 'area_137', 'area_138', 'economy_139', 'economy_149', 'economy_161', 'economy_140', 'economy_150', 'economy_162', 'economy_141', 'economy_151', 'economy_163', 'economy_142', 'economy_152', 'economy_164', 'economy_143', 'economy_153', 'economy_165', 'economy_144', 'economy_154', 'economy_166', 'economy_145', 'economy_155', 'economy_167', 'economy_146', 'economy_156', 'economy_168', 'economy_147', 'economy_157', 'economy_169', 'economy_148', 'economy_158', 'economy_170', 'economy_171', 'economy_172', 'empty_173', 'empty_174', 'empty_175', 'empty_176', 'house_177', 'house_178', 'house_179', 'house_185', 'house_180', 'house_186', 'house_181', 'house_187', 'house_182', 'house_188', 'house_183', 'house_189', 'house_184', 'house_190', 'arealevel_195', 'arealevel_196', 'arealevel_197', 'arealevel_198']
part of my code:
import dalex as dx
def predict_function(model, data):
return np.exp(model.predict(data))
avm_exp = dx.Explainer(stacking, te_x, te_y, predict_function = predict_function, label = 'Stacking')
# top 10 variables including ['lvr19','lvr20']
avm_mp_variable_importance = avm_exp.model_parts()
avm_mp_variable_importance.plot(max_vars = 10)
# try PD profiles
avm_mp = avm_exp.model_profile()
avm_mp.plot(variables = ['lvr19','lvr20'])
some samples:
y | lvr_7 | lvr_19 | lvr_20 | poi_77 | poi_96 | economy_170 | empty_174 | house_182 | arealevel_198 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 12.7757 | 0.6360 | 0 | 0 | 3.2377 | 3.1282 | 22.1056 | 0 | 6.7715 | 0 13.1238 | 0.2896 | 1 | 0 | 2.7503 | 1.5410 | 22.1056 | 0 | 5.8322 | 0 30.9746 | 0.2102 | 1 | 0 | 1.1943 | 0.7305 | 22.0582 | 0 | 6.4372 | 0 55.7364 | 0.1196 | 1 | 0 | 2.2597 | 1.8203 | 22.0582 | 0 | 6.1284 | 0 9.4678 | 0.1229 | 1 | 0 | 1.1943 | 1.5410 | 22.1667 | 0 | 7.0390 | 0 14.5738 | 0.1196 | 0 | 0 | 1.1943 | 0.7305 | 22.1033 | 0 | 6.1284 | 0 11.4736 | 0.7160 | 0 | 0 | 3.3408 | 2.2597 | 22.1056 | 0 | 6.9183 | 0 13.2374 | 0.2079 | 1 | 0 | 1.1943 | 0.7305 | 22.1056 | 0 | 6.3949 | 0 12.5849 | 0.2635 | 1 | 0 | 1.5410 | 1.1943 | 22.1056 | 0 | 6.4372 | 0 19.4839 | 0.1229 | 0 | 0 | 1.5410 | 0.7305 | 22.0582 | 0 | 5.8322 | 0Shouldn't it be ["lvr_19", "lvr_20"]
then?
It is ok with single quotation or double quotation.
Since I tried both using other vaiables.
It works, but error when input=["lvr_19", "lvr_20"]
or ['lvr_19', 'lvr_20']
@ChunJen you seem to use ["lvr19", "lvr20"]
instead of ["lvr_19", "lvr_20"]
. "_"
is important.
Sorry, my bad... fixed now. Thank you @hbaniecki !
I tried to use the top5 important variables (with dalex variable importance) ploting PD profiles. but got the error message:
`TypeError Traceback (most recent call last) /tmp/ipykernel_35132/2656290125.py in
1 # PD profiles
2 avm_mp = avm_exp.model_profile()
----> 3 avm_mp.plot(variables = ['lvr19','lvr20'])
~/.local/lib/python3.7/site-packages/dalex/model_explanations/_aggregated_profiles/object.py in plot(self, objects, geom, variables, center, size, alpha, color, facet_ncol, facet_scales, title, y_title, horizontal_spacing, vertical_spacing, show) 237 all_variables = _global_utils.intersect_unsorted(variables, all_variables) 238 if len(all_variables) == 0: --> 239 raise TypeError("variables do not overlap with " + ''.join(variables)) 240 241 _result_df = _result_df.loc[_result_df['vname'].isin(all_variables), :]
TypeError: variables do not overlap with lvr19lvr20`