0-5788719150923125 / praxis

as above, so below
https://src.eco
MIT License
2 stars 1 forks source link

Implement Differential Attention #7

Closed Vectorrent closed 1 month ago

Vectorrent commented 1 month ago

This PR implements Differential Attention, which is a proposed method to mitigate hallucinations and filter-out noise in the self-attention mechanism. The feature is enabled by default, but can be reverted to standard self-attention by using differential_heads=1 in the PraxisConfiguration object.