This PR adds supports for Gemma-2 style Tanh Softcapping via the two flags attn_logit_softcap and final_logit_softcap. It builds on PRs #99 and #100 which should be merged in first.
Tests
[x] Is the new feature tested? (Not always necessary for all changes -- just adding to the checklist to keep track)
Changes
This PR adds supports for Gemma-2 style Tanh Softcapping via the two flags
attn_logit_softcap
andfinal_logit_softcap
. It builds on PRs #99 and #100 which should be merged in first.Tests