AnswerDotAI / bert24

Apache License 2.0
66 stars 4 forks source link

Add support for Gemma-2 style Tanh Softcapping #101

Open warner-benjamin opened 3 months ago

warner-benjamin commented 3 months ago

Changes

This PR adds supports for Gemma-2 style Tanh Softcapping via the two flags attn_logit_softcap and final_logit_softcap. It builds on PRs #99 and #100 which should be merged in first.

Tests