NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review -- Gaussian Error Linear Units (GELUs). #107

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review — Gaussian Error Linear Units (GELUs).

NorbertZheng commented 1 year ago

Overview

Gaussian Error Linear Units (GELUs). GELU, by University of California, and Toyota Technological Institute at Chicago. 2016 arXiv, Over 600 Citations.

Activation Unit, Image Classification, POS Tagging, Phone Recognition.

NorbertZheng commented 1 year ago

Gaussian Error Linear Unit (GELU)

image GELU (μ=0, σ=0) vs ReLU vs ELU.

Specifically, the neuron input x can be multiplied by $m\sim Bernoulli(\Phi(x))$, where $\Phi(x) = P(X\leq x)$; $X\sim N(0, 1)$ is the cumulative distribution function of the standard normal distribution.

Since the cumulative distribution function of a Gaussian is often computed with the error function, the Gaussian Error Linear Unit (GELU) is defined as: image

The above equation is approximated as: image

or: image

if greater feedforward speed is worth the cost of exactness.

Different $N(\mu, \sigma)$ can be used as CDF, but in this paper, $N(0,1)$ is used.

NorbertZheng commented 1 year ago

Experimental Results

MNIST Classification

image MNIST Classification Results. Left are the loss curves without Dropout, and right are curves with a Dropout rate of 0.5.

GELU tends to have the lowest median training log loss with and without Dropout.

NorbertZheng commented 1 year ago

MNIST Autoencoder

image MNIST Autoencoding Results.

GELU accommodates different learning rates and significantly outperforms the other nonlinearities.

NorbertZheng commented 1 year ago

TIMIT Frame Classification

image

After five runs per setting, median test error chosen at the lowest validation error is 29.3% for the GELU, 29.5% for the ReLU, and 29.6% for the ELU.

NorbertZheng commented 1 year ago

CIFAR-10/100 Classification

image CIFAR-10 Results.

image CIFAR-100 Results.

On CIFAR-10, ultimately, the GELU obtains a median error rate of 7.89%, the ReLU obtains 8.16%, and the ELU obtains 8.41%.

On CIFAR-100, the GELU achieves a median error of 20.74%, the ReLU obtains 21.77%, and the ELU obtains 22.98%.

NorbertZheng commented 1 year ago

Reference