cooijmanstim / recurrent-batch-normalization

64 stars 26 forks source link

How are batch statistics computed? #6

Open OverLordGoldDragon opened 4 years ago

OverLordGoldDragon commented 4 years ago

I'm implementing recurrent BN in Keras, but looking at the original paper and those citing it, a detail remains unclear to me: how are batch statistics computed? In the original, authors state (pg. 3) (emphasis mine):

At training time, the statistics E[h] and Var[h] are estimated by the sample mean and sample variance of the current minibatch

Yet another paper (pg. 3) using and citing it describes:

We subscript BN by time (BN_t) to indicate that each time step tracks its own mean and variance. In practice, we track these statistics as they change over the course of training using an exponential moving average (EMA)

My question's thus two-fold:

  1. Are minibatch statistics computed per immediate minibatch, or as an EMA?
  2. How are the inference parameters, shared across all timesteps, gamma and beta computed? Is the computation in (1) simply averaged across all timesteps? (e.g. average EMA_t for all t)

Existing implementations: in Keras and TF below, but are all outdated, and am unsure regarding correctness