Closed eknag closed 3 days ago
Add benchmark - only bf16 cos implementations that leverages vbfmmlaq_f32 instead of 3 seperate dot products.
Thank you, @eknag!
Add benchmark - only bf16 cos implementations that leverages vbfmmlaq_f32 instead of 3 seperate dot products.