Add Elastic Net Regularization and Enhanced Control Over Merging Coefficients with DAM Layers
Description:
This PR introduces several enhancements to the DAMBaseLayer and DAMLinearLayer classes, focused on providing greater control and flexibility over the merging process in models that integrate multiple sources. Key improvements include the addition of Elastic Net regularization, options for applying the tanh non-linearity, and refinements to the computation of merging weights and biases.
Key Features:
Elastic Net Regularization:
L1 and L2 Regularization: The compute_mergers_L1_L2_reg method now supports both L1 and L2 regularization, allowing users to apply Elastic Net regularization to the merging coefficients. This combination promotes sparsity and stability, encouraging the coefficients to reflect empirical operations like Trim, Elect, and Sign.
tanh Non-Linearity Control:
Optional tanh Application: Users can now choose whether to apply the tanh non-linearity to the merging coefficients during the forward pass. This ensures that coefficients remain within the ([-1, +1]) range, providing controlled and predictable behavior.
Flexible Initialization: The use_tanh flag determines whether tanh is applied, giving users the flexibility to enable or disable this constraint based on the specific needs of their model.
Greater Control: Users can now fine-tune the behavior of merging coefficients, ensuring that they align with both theoretical expectations and empirical findings.
Enhanced Stability: The combination of Elastic Net regularization with optional tanh constraints provides a robust framework for merging models, promoting both stability and interpretability.
Flexible Implementation: The new features offer flexibility, allowing users to easily adapt the merging process to different scenarios and requirements.
Add Elastic Net Regularization and Enhanced Control Over Merging Coefficients with DAM Layers
Description:
This PR introduces several enhancements to the
DAMBaseLayer
andDAMLinearLayer
classes, focused on providing greater control and flexibility over the merging process in models that integrate multiple sources. Key improvements include the addition of Elastic Net regularization, options for applying thetanh
non-linearity, and refinements to the computation of merging weights and biases.Key Features:
Elastic Net Regularization:
compute_mergers_L1_L2_reg
method now supports both L1 and L2 regularization, allowing users to apply Elastic Net regularization to the merging coefficients. This combination promotes sparsity and stability, encouraging the coefficients to reflect empirical operations like Trim, Elect, and Sign.tanh
Non-Linearity Control:tanh
Application: Users can now choose whether to apply thetanh
non-linearity to the merging coefficients during the forward pass. This ensures that coefficients remain within the ([-1, +1]) range, providing controlled and predictable behavior.use_tanh
flag determines whethertanh
is applied, giving users the flexibility to enable or disable this constraint based on the specific needs of their model.tanh
constraints provides a robust framework for merging models, promoting both stability and interpretability.