Closed diegobatt closed 2 years ago
Hi there, thanks a lot for the issue and PR. I see the point you are making.
A solution might be to have in the index o(1) instead of (1). So for example x{+j} = (x{o(1)}, x_{o(2)}, .... and also keep the sentence you added in line 321. I am not a fan of introducing another letter y though.
What do you think?
Hi @christophM ! Thank you for the feedback. Shouldn't we discuss this in the PR threads instead of the issue?
adding o to the index will solve
- Order instance x: $x_o=(x_{o(1)},\ldots,x_{(j)},\ldots,x_{o(p)})$
- Order instance z: $z_o=(z_{o(1)},\ldots,z_{(j)},\ldots,z_{o(p)})$
but if I understood correctly when building x_{+j}
and x_{-j}
you will need to mix the original ordering with the new one. the new instances have to be in the original ordering for evaluating them, but the features that are replaced with z
are taken at the right of o(j)
in the ordering o
(i.e all the features with o(i) > o(j)). That mixing is the one I struggled with the most. How would you state that ? I'm 100% percent on board with not adding y
if possible
Maybe leveraging the fact that all this ordering ends up being equivalent to using x or z with 1/2 probability in the new instances ?
I guess we can discuss the issue here, as I am still trying to figure out how this issue is best described.
I am not sure if I understood what should be improved, but could it be that the following needs to be explained in the book? The ordering is merely a trick to create two new data instance, where some parts come from x and some from z. In the actual function call, in the code, we do not actually use the ordering. We have to leave the features in the same order as before, because the prediction functions usually expect the features to be always ordered in the same way. The ordering only affected for each feature value whether it came from x or z. Do you think that explanation would make it easier to understand the Shapley algorithm? And does it address your feedback?
Exactly, that explanation would address the feedback.
About the thing I said here:
Maybe leveraging the fact that all this ordering ends up being equivalent to using x or z with 1/2 probability in the new instances ?
I think that all the ordering and mixing is in the end equivalent to just getting an instance that gets the feature i != j from z or x with 1/2 probability. As the ordering is random, all sequences have the same probability, so half the sequence have the feature i on the right of j and the other half on the left.
Going for this approach might be the easiest way to explain it
The algorithm for Monte-Carlo estimation of the Shapley value is described as follows in Chapter 5 section 9
I believe the description is not clear about how the
random permutation o
is used. It asks to sort the instancesx_o
andz_o
, which I assume are thex
andz
instances, respectively, given the random ordero
. However, the equation relates more to an instance sorted according to the original feature order. I say this because the first element is1
, the last one isp
, andj
is right in the middle, just as in the original order. On top of that,x_+j
is fed to the modelf
with this same order. So is this order the original one or the one that results from the random permutation?My understanding is that this should be more like:
o
of the p features (e.g.,[x_4, x_2 ,x_1, x_3]
)j
(e.g.,2
), replace all the instances to its right (in the random order) with instances fromz
(e.g.,[x_4, x_2, z_1, z_3]
)j
(e.g.,[z_1, x_2, z_3, x_4], [z_1, z_2, z_3, x_4]
)I might be getting the whole Monte-Carlo idea wrong here, but I believe the description of the algorithm relates more to something like I described