christophM / interpretable-ml-book

Book about interpretable machine learning
https://christophm.github.io/interpretable-ml-book/
Other
4.75k stars 1.06k forks source link

5.9 Shapley values confusion #233

Closed Abdul-Moeed closed 3 years ago

Abdul-Moeed commented 3 years ago

I'm having some confusion in this section. I have attached a screenshot with the highlighted parts which are the cause of confusion. The issues are labelled numerically, so it's easier to point towards the issues:

  1. What exactly is the difference between x and x_0? Same goes for z. As far as I understand, the features are already ordered 1..j..p.
  2. What does it mean to generate a random order of the features? And how is this meaningful?
  3. It says all values in the order before feature j are replaced by feature values from the sample z, but the formula in the bullet point has it the other way i.e. all values AFTER feature j are replaced by features values from the sample z.

I think a concrete example in Python in the text would be very helpful to clarify points 1 and 2. Thanks! Capture

sparsh999gupta commented 3 years ago

@Abdul-Moeed Hey Moeed! Have you found a reason for this? I also had the same query as yours. Especially regarding your 3rd point.

christophM commented 3 years ago

Thanks for highlighting this.

  1. x_o is the same as x, but we show the features in a different order
  2. Just a technical solution for replacing some feature values but not others. It's meaningful because it is a way to simulate a "coalition" of features. Imagine a team that starts with 0 players, and then you add players one by one. That is similar to deciding on a random joining order of features.
  3. that's a mistake in the book. I will fix it.