explosion / cython-blis

💥 Fast matrix-multiplication as a self-contained Python library – no system dependencies!
Other
218 stars 37 forks source link

gemm: use uninitialized array when beta is zero #72

Closed danieldk closed 2 years ago

danieldk commented 2 years ago

This avoids memsetting the memory if we are going to overwrite it anyway. Profile before the change (memset highlighted):

before-memset-change

Profile after the change:

after-memset-change

The performance on de_core_news_lg on a Ryzen 5950X seems to increase from ~21300 WPS to ~24100 WPS.