About the election of a proper baseline for integrated gradients

ajbanegas commented 3 years ago

Hi, I'm working on a project in which we use different deep learning models and I want to use alibi to have a better insight into the way the models work. I have a simple in-house tabular dataset following a rule like this:

if 0.1 <= A <= 0.3 and 0.5 <= B <= 0.7 then the target class is 1, otherwise 0

Although A and B are the highest scored features when I average all the local explanations, obtaining the same explanation is not possible for the individual cases. I've tested several parameters and have the intuition that my baseline is not totally good.

As my dataset is a tabular one (a matrix of floats), the baseline consists of a vector, in which each feature is initially represented by a random float between 0 and the maximum value of its column.

Can anyone help me understand why, in this particular case, local explanations are mostly wrong but, in general terms, the most important features are identified correctly?

jklaise commented 3 years ago

It sounds like your baseline is random (i.e. different for each instance)? What kind of results do you get with a constant baseline that is the expected value over the random vectors?

Also, are you using the true decision rule as the model to be explained?

If we were to engineer a baseline that is uninformative, when fed to a model it should result into an uncertain decision about which class it should belong to. Arguably it is difficult to do this if the prediction is a hard label as either 0 or 1 will always be predicted. It would be interesting to engineer a baseline that results in 50:50 decisions for a probabilistic classifier and try to explain that one.

I think what you may be seeing could be expected form the combined facts that you have a hard-label model and that the baselines are random for each instance. To obtain local explanations that make sense you would likely need a (constant) baseline that results in uncertain predictions by the model.

More on the topic of baselines for IG: https://distill.pub/2020/attribution-baselines/

ajbanegas commented 3 years ago

Thanks @jklaise. I'll try to answer all the points:

1) Yes, my baseline is based on random numbers following a uniform distribution within the values of each column. Besides, the baseline is different for each sample to be explained. 2) I also tested constant baselines (e.g. all zeros, the mean value of each column) but the results were even worse. Probably, it happened because I didn't choose the right constant values. 3) If I'm not mistaken, you suggest to take a baseline consisting of values having a 50% of probability of belonging either to class 0 or 1. And, in addition, using the same baseline for all the samples.

BTW, I already read the article you mention and I must say it's great.

jklaise commented 3 years ago

@ajbanegas yes, I was suggesting number 3.

SeldonIO / alibi

About the election of a proper baseline for integrated gradients #408