feedzai / timeshap

TimeSHAP explains Recurrent Neural Network predictions.
Other
162 stars 30 forks source link

"Pruned Events" vs "Other Events" vs "Other Features" #14

Closed aminnayebi closed 2 years ago

aminnayebi commented 2 years ago

By looking at the cell_data dataframe, I am seeing two types of entry for the Event column, i.e., "Pruned Events" and "Other Events". Could you please explain how different they are?

image

My other question is that even if I set the top_x_feats in cell_dict to be total number of features in my data, still the cell_data data frame contains some rows in which there is a non-zero shap value for "Other Features", for example:

image

How is that possible? There shouldn't be any "Other Features" as I have all the features separately listed in the cell_data data frame.

JoaoPBSousa commented 2 years ago

Hello @aminnayebi,

"Pruned Events" are events that are pruned by the temporal coalition pruning algorithm. "Other Events" are events that are not pruned, but are deemed less relevant for the cell level calculation through the top_x_events or event_threshold parameters.

"Pruned Events" are represented in dark gray, (E-59 to E-7). "Other Events" are represented in light gray (E-6,E-5,E-3,E-2,E-1,)

Regarding your question about the "Other Features" when there should not be "Other Features", this is indeed wrong and I have pushed the correct code to the repo. The importance for this group was always being calculated, but in the extreme case that you described where there are no "Other Features" the importance being assigned by TimeSHAP was noise, since no perturbations were being calculated.

Thank you for noting this issue, and let me know if you have any further questions.

aminnayebi commented 2 years ago

Thank you very much. Closing the issue!