epfLLM / Megatron-LLM

distributed trainer for LLMs
Other
529 stars 76 forks source link

Add LIMA dropout #21

Closed andreaskoepf closed 1 year ago

andreaskoepf commented 1 year ago

When --lima_dropout is specified use a layer dependent dropout probability, starting at p_d=0.0 at the bottom layer and linearly raising the rate to the value specified by --hidden_dropout at the last layer.

See: "LIMA: Less Is More for Alignment", Zhou et al 2023, https://arxiv.org/abs/2305.11206