Enny1991 / PLSTM

137 stars 32 forks source link

Possible bug in the PLSTM #1

Closed philipperemy closed 7 years ago

philipperemy commented 7 years ago

The Kronos gate is applied at the end.

# APPLY KRONOS GATE
c = k * c + (1. - k) * c_prev
m = k * m + (1. - k) * m_prev
# END KRONOS GATE

But I think the output o from Equation (4) should be updated with the value computed in the Equation (8). For now it's updated with c_tilde I guess.

https://arxiv.org/pdf/1610.09513v1.pdf

Enny1991 commented 7 years ago

Hi,

So I checked with the author (Danny) and he said that the implementation is correct this way. You can see that also in his original implementation (here) the kronos gate is applied in the end. In the paper also h_tilde is calculated wit the c_tilde and not with the c. With this said it shouldn't change too much if you apply the gate before calculating the o gate or even the h. You could try and let me know.

+Enea

philipperemy commented 7 years ago

@Enny1991 Thanks for your reply. There's no problem for h and c. Both values are correct. We agree on that.

My concern was that the output o should be updated with new_c (in my opinion). Something like this:

new_c = kappa * new_c_tilde + (1 - kappa) * c_prev
if self._use_peepholes:
    o += w_o_peephole * new_c
new_h_tilde = sigmoid(o) * self._activation(new_c_tilde)
new_h = kappa * new_h_tilde + (1 - kappa) * h_prev
new_state = (new_c, new_h)
return new_h, new_state

In your implementation, the output gate value o is not updated with the time gates. And if you take a closer look to Fig 1, the value C_t from the peepholes should come from the time gate.

https://raw.githubusercontent.com/philipperemy/tensorflow-phased-lstm/master/fig/fig1.png

Let me know what do you think!

philipperemy commented 7 years ago

I also agree it won't change much! I just want to have your opinion on that! Thanks!

Enny1991 commented 7 years ago

Ok I understand. So in LSTM both h and o are calculated based on c. But in PSLTM h is calculated based on c_tilde not c so to be consistent I would calculate o based on c_tilde and not based on c. Does this make sense?

philipperemy commented 7 years ago

Yes what you said makes absolute sense and is consistent.

But for me, it's not consistent with the paper. However and maybe, Danny made a small mistake in the paper and wanted to calculate o based on c_tilde and not based on c.

It's not a big deal. I just wanted to be rigorous regarding the paper. If we write down all the equations, it is suggested that o is calculated from c and not c_tilde.

dannyneil commented 7 years ago

Hi all,

Yes, Philippe, you are correct in that Equation 4 should reference c_tilde and not c. I can add a point to the paper to mention that, and will update Figure 1 so the line is correctly drawn to c_tilde instead. The intuition here is that the gates should be blind to the effect of the khronos gate; input, forget and output gate should all operate as if the cell were a normal LSTM cell, while the khronos gate allows it to either operate or not operate (and then linearly interpolates between these two states). If the output gate is influenced by the khronos gate (if the peepholes reference c instead of c_tilde), then the PLSTM would no longer be a gated LSTM cell, but somehow be self-dependent on the time gate's actual operation.

Ithink everyone's right in that it wouldn't influence much -- but it should be updated in the paper. Thanks very much for pointing out the issue, Philippe! -Danny

Enny1991 commented 7 years ago

Thanks guys, I close the thread since there is no real issue, in the end the implementation is correct.

+Enea

philipperemy commented 7 years ago

@dannyneil you're welcome! And thanks for this great clarification! Much appreciated!

Philippe