This PR implements Differential Attention, which is a proposed method to mitigate hallucinations and filter-out noise in the self-attention mechanism. The feature is enabled by default, but can be reverted to standard self-attention by using differential_heads=1 in the PraxisConfiguration object.
This PR implements Differential Attention, which is a proposed method to mitigate hallucinations and filter-out noise in the self-attention mechanism. The feature is enabled by default, but can be reverted to standard self-attention by using
differential_heads=1
in thePraxisConfiguration
object.