Question about the COMPUTE_PRECISE_BN?

hello, in the _update_bn_statsgpu function, workspace.FeedBlob( 'gpu{}/'.format(i) + bn_layer + '_bn_rm', np.array(self._meanX_dict[bn_layer], dtype=np.float32),

meanX of 200 batch_size num_gpu training samples is computed, then rewrite the mem of bn_layer + '_bn_rm'. so why not use the running mean accumulated during training?
why the mean computed during COMPUTE_PRECISE_BN switch is more precise?

facebookresearch / video-nonlocal-net

Question about the COMPUTE_PRECISE_BN? #50