Open alecGraves opened 5 years ago
as @beatriz-ferreira mentions in #3, the paper implements normalization for beta, providing more standard performance over different latent vector sizes. This behavior (as default or an option) would be a good addition to the layer.
as @beatriz-ferreira mentions in #3, the paper implements normalization for beta, providing more standard performance over different latent vector sizes. This behavior (as default or an option) would be a good addition to the layer.