foundation-model-stack / fms-extras

Apache License 2.0
20 stars 9 forks source link

Incorporate suggested changes from TGI PR #27

Closed daviswer closed 6 months ago

daviswer commented 7 months ago

See comments here, specifically A and B

Makes forward and generate_suffixes more efficient by fusing ops and removing repeated allocations

Outputs confirmed the same up to 1e-6 error (due to very slightly different handling of LN epsilon)

Once this is landed in our PR we can mirror the changes to the TGI branch