Closed eknag closed 4 months ago
Add benchmark - only bf16 cos implementations that leverages vbfmmlaq_f32 instead of 3 seperate dot products.
Thank you, @eknag!
Add benchmark - only bf16 cos implementations that leverages vbfmmlaq_f32 instead of 3 seperate dot products.