arcee-ai / DAM

30 stars 4 forks source link

[AR-152] Can we automate a TIES like process? #11

Closed shamanez closed 1 month ago

shamanez commented 1 month ago

Add Elastic Net Regularization and Enhanced Control Over Merging Coefficients with DAM Layers

Description:

This PR introduces several enhancements to the DAMBaseLayer and DAMLinearLayer classes, focused on providing greater control and flexibility over the merging process in models that integrate multiple sources. Key improvements include the addition of Elastic Net regularization, options for applying the tanh non-linearity, and refinements to the computation of merging weights and biases.

Key Features:

  1. Elastic Net Regularization:

    • L1 and L2 Regularization: The compute_mergers_L1_L2_reg method now supports both L1 and L2 regularization, allowing users to apply Elastic Net regularization to the merging coefficients. This combination promotes sparsity and stability, encouraging the coefficients to reflect empirical operations like Trim, Elect, and Sign.
  2. tanh Non-Linearity Control:

    • Optional tanh Application: Users can now choose whether to apply the tanh non-linearity to the merging coefficients during the forward pass. This ensures that coefficients remain within the ([-1, +1]) range, providing controlled and predictable behavior.
    • Flexible Initialization: The use_tanh flag determines whether tanh is applied, giving users the flexibility to enable or disable this constraint based on the specific needs of their model.