Closed andreaskoepf closed 1 year ago
When --lima_dropout is specified use a layer dependent dropout probability, starting at p_d=0.0 at the bottom layer and linearly raising the rate to the value specified by --hidden_dropout at the last layer.
--lima_dropout
--hidden_dropout
See: "LIMA: Less Is More for Alignment", Zhou et al 2023, https://arxiv.org/abs/2305.11206
When
--lima_dropout
is specified use a layer dependent dropout probability, starting at p_d=0.0 at the bottom layer and linearly raising the rate to the value specified by--hidden_dropout
at the last layer.See: "LIMA: Less Is More for Alignment", Zhou et al 2023, https://arxiv.org/abs/2305.11206