Trusted-AI / AIX360

Interpretability and explainability of data and machine learning models
https://aix360.res.ibm.com/
Apache License 2.0
1.62k stars 308 forks source link

Questions about the results obtained by XAI method #167

Open 9527-ly opened 2 years ago

9527-ly commented 2 years ago

I found a strange phenomenon. For the same model, the same training sample and test sample, other operations are identical. Theoretically, the values obtained by using the XAI method (like Saliency) to evaluate the interpretability of the model should be the same. However, I retrained a new model, and the interpretability values obtained are completely different from those obtained from the previous model. Does anyone know why this happens? The interpretability value is completely unstable, and the results cannot be reproduced. Unless I completely save this model after training it, and then reload this parameter, the results will be the same. Does anyone know why

adrida commented 10 months ago

This is a common phenomenon when considering perturbation based XAI methods. In order to estimate the marginal contribution of your input features (pixels in the case of saliency maps), XAI algorithms often perform some sort of sampling in the input space, which can be non-deterministic (especially if a random sampling is done). This is also why explanations obtained using this kind of method needs to be manipulated carefully. What they are really doing is creating a local simple surrogate (linear classifier, small decision tree) to explain one sample, and in order to make sense, this makes the assumption that the decision boundary will be approximated by a simple model.