I'm a little rusty at this stuff but I believe there might be a bug in the implementation of the position bias EM algorithm here for p_r1 = reshaped_train_labels + (1 - reshaped_train_labels) * p_e0_r1_c0. In particular, I think p_r1 for unclicked documents, (the 1-reshaped_train_labels case), is incorrect.
Using notation from the paper, page 614, we want to compute $P(R=1|C=0, q, k, d)$ by marginalizing $P(R=1, E|C=0, q, k, d)$ over $E$:
$$
\begin{aligned}
P(R=1|C=0, q, k, d) =& \underbrace{P(R=1, E=1|C=0, q, k, d)}_{=0} P(E=1|C=0, q, k, d) \
& + P(R=1, E=0|C=0, q, k, d) P(E=0|C=0, q, k, d) \
=& P(R=1, E=0|C=0, q, k, d) (1- P(E=1|C=0, q, k, d))
\end{aligned}
$$
The implementation just has $P(R=1|C=0, q, k, d) = P(R=1, E=0|C=0, q, k, d)$.
That is, it's missing the $(1- P(E=1|C=0, q, k, d))$ term.
Am I missing something or is this a bug? Unfortunately, the paper just says "From this, we can compute the marginals P(E = 1|c, q, d, k) and P(R = 1|c, q, d, k)" without additional details and I can't find any other implementations of this paper to see how they're working. Like I said, I'm rusty at this so I may be missing something obvious.
Note, I think the same issue comes up here with $P(E=1|C=0, q,k, d)$, which is not explicitly named in the code like p_r1, but is computed as reshaped_train_labels + (1 - reshaped_train_labels) * p_e1_r0_c0 if I'm reading things correctly.
I think it should instead be
$$
P(E=1|C=0, q, k, d) = P(R=0, E=1|C=0, q, k, d) (1- P(R=1|C=0, q, k, d))
$$
Of course, these depend on each other, but with these two equations we can solve for $P(E=1|C=0, q, k, d)$ and $P(R=1|C=0, q, k, d)$ in terms of $P(R, E|C=0, q, k, d)$ which we can compute from gamma and self.propensity.
I'm a little rusty at this stuff but I believe there might be a bug in the implementation of the position bias EM algorithm here for
p_r1 = reshaped_train_labels + (1 - reshaped_train_labels) * p_e0_r1_c0
. In particular, I thinkp_r1
for unclicked documents, (the1-reshaped_train_labels
case), is incorrect.Using notation from the paper, page 614, we want to compute $P(R=1|C=0, q, k, d)$ by marginalizing $P(R=1, E|C=0, q, k, d)$ over $E$:
$$ \begin{aligned} P(R=1|C=0, q, k, d) =& \underbrace{P(R=1, E=1|C=0, q, k, d)}_{=0} P(E=1|C=0, q, k, d) \ & + P(R=1, E=0|C=0, q, k, d) P(E=0|C=0, q, k, d) \ =& P(R=1, E=0|C=0, q, k, d) (1- P(E=1|C=0, q, k, d)) \end{aligned} $$
The implementation just has $P(R=1|C=0, q, k, d) = P(R=1, E=0|C=0, q, k, d)$. That is, it's missing the $(1- P(E=1|C=0, q, k, d))$ term.
Am I missing something or is this a bug? Unfortunately, the paper just says "From this, we can compute the marginals P(E = 1|c, q, d, k) and P(R = 1|c, q, d, k)" without additional details and I can't find any other implementations of this paper to see how they're working. Like I said, I'm rusty at this so I may be missing something obvious.
Note, I think the same issue comes up here with $P(E=1|C=0, q,k, d)$, which is not explicitly named in the code like
p_r1
, but is computed asreshaped_train_labels + (1 - reshaped_train_labels) * p_e1_r0_c0
if I'm reading things correctly.I think it should instead be
$$ P(E=1|C=0, q, k, d) = P(R=0, E=1|C=0, q, k, d) (1- P(R=1|C=0, q, k, d)) $$
Of course, these depend on each other, but with these two equations we can solve for $P(E=1|C=0, q, k, d)$ and $P(R=1|C=0, q, k, d)$ in terms of $P(R, E|C=0, q, k, d)$ which we can compute from
gamma
andself.propensity
.