cooijmanstim / recurrent-batch-normalization

64 stars 26 forks source link

Is beta being trained or is it fixed at zero? #4

Closed carlthome closed 7 years ago

carlthome commented 7 years ago

The paper states that both the hidden and input betas are set to the zero vector to avoid unnecessary redundancy (e.g. LSTM bias is enough) but it's not clear if they are trained or just kept constant at zero. They are called parameters throughout, and in the experiments they are initialized, implying training.

professoroakz commented 7 years ago

I too thought of what @carlthome is talking about.

cooijmanstim commented 7 years ago

Conceptually they are kept at zero; in practice we never create those parameters. If you expand the BN expressions in equation 6, you'll see that you get three terms beta_h + beta_x + b. These parameters are redundant and so we just have one bias/beta vector ("b") instead of three. You can train all three and it probably won't hurt.

carlthome commented 7 years ago

Thanks @cooijmanstim! I really appreciate the help. :1st_place_medal:

Also, I just noticed a mistake in my reproduction (I also set beta_c to zero, derp). https://github.com/carlthome/tensorflow-convlstm-cell/commit/c0e141f68fcb9e2409e311b26d75b4c466e3457d