HMUNACHI / nanodl

A Jax-based library for designing and training transformer models from scratch.
MIT License
274 stars 11 forks source link

Norms have epsilon value set to dropout prob #25

Closed danny-1k closed 7 months ago

danny-1k commented 7 months ago

Norms across most Attention based implementations have epsilon value set to dropout probability