Trainable/Freezable Layer Norm, Embedding Coefficients and Seamless Logits Computation on-the-fly
This PR enhances the model merging process by introducing the ability to pass trainable layer norm coefficients and embedding metrics. The update allows users to control whether these components should be trainable or fixed, improving flexibility and mitigating performance degradation when fine-tuning specific layers.
Key Features:
Trainable Layer Norm Coefficients:
A new flag, is_norm_coef_trainable, enables users to specify whether layer norm coefficients are trainable.
If set to True, the coefficients are updated during training. If False, constant values are used, ensuring that unnecessary fine-tuning does not degrade performance.
Trainable Embedding Coefficients:
The flag is_embedding_coef_trainable offers similar control over embedding layers, allowing users to train these layers or keep them fixed, depending on the specific needs of their model.
Efficient Logit Computation:
This PR also streamlines the training pipeline by removing the need for logit computation during training, making the process more seamless and efficient, especially for large-scale models.
Trainable/Freezable Layer Norm, Embedding Coefficients and Seamless Logits Computation on-the-fly
This PR enhances the model merging process by introducing the ability to pass trainable layer norm coefficients and embedding metrics. The update allows users to control whether these components should be trainable or fixed, improving flexibility and mitigating performance degradation when fine-tuning specific layers.
Key Features:
Trainable Layer Norm Coefficients:
is_norm_coef_trainable
, enables users to specify whether layer norm coefficients are trainable.True
, the coefficients are updated during training. IfFalse
, constant values are used, ensuring that unnecessary fine-tuning does not degrade performance.Trainable Embedding Coefficients:
is_embedding_coef_trainable
offers similar control over embedding layers, allowing users to train these layers or keep them fixed, depending on the specific needs of their model.Efficient Logit Computation: