Closed carlthome closed 7 years ago
I too thought of what @carlthome is talking about.
Conceptually they are kept at zero; in practice we never create those parameters. If you expand the BN expressions in equation 6, you'll see that you get three terms beta_h + beta_x + b. These parameters are redundant and so we just have one bias/beta vector ("b") instead of three. You can train all three and it probably won't hurt.
Thanks @cooijmanstim! I really appreciate the help. :1st_place_medal:
Also, I just noticed a mistake in my reproduction (I also set beta_c to zero, derp). https://github.com/carlthome/tensorflow-convlstm-cell/commit/c0e141f68fcb9e2409e311b26d75b4c466e3457d
The paper states that both the hidden and input betas are set to the zero vector to avoid unnecessary redundancy (e.g. LSTM bias is enough) but it's not clear if they are trained or just kept constant at zero. They are called parameters throughout, and in the experiments they are initialized, implying training.