explosion / cython-blis

💥 Fast matrix-multiplication as a self-contained Python library – no system dependencies!
Other
219 stars 37 forks source link

Slower SSE kernels used on Zen 3 #53

Closed danieldk closed 3 years ago

danieldk commented 3 years ago

Zen 3 is not detected by BLIS. Consequently, BLIS falls back to generic kernels that only use SSE intrinstics (not AVX2). See e.g. the following profile of sgemm use on a Ryzen 5900X:

Screen Shot 2021-10-07 at 10 24 07

I don't think we can do anything about this now, since upstream BLIS does not support Zen 3 yet (only the AMD fork).

Just posting this, so that we are aware of the issue.

danieldk commented 3 years ago

For the time being, we could just use the Zen 2 kernels on Zen 3. It does make quite a large performance difference. I'll do a PR for this.