Add LIMA dropout - Githubissues

When --lima_dropout is specified use a layer dependent dropout probability, starting at p_d=0.0 at the bottom layer and linearly raising the rate to the value specified by --hidden_dropout at the last layer.

See: "LIMA: Less Is More for Alignment", Zhou et al 2023, https://arxiv.org/abs/2305.11206

epfLLM / Megatron-LLM

Add LIMA dropout #21