For some of the ssyrk variants, e.g. UT, the returned results are incorrect. One issue that I spotted is that in driver/level3/meson.build the except entry isn't handled properly (fixed in this PR).
As a result all except entries in ext_mappings dict were ignored. As a result, syrk functions used original flags defined in ext_mappings instead of those in ext_mappings_l3.
This caused lower-nontranspose to pick these _LN flags (again, except was ignored):
As a result ssyrk_LN was missing -DLOWER flag and had -DLEFT instead.
But why the error occurred in ssyrk_UT? I suspect that it's because the tests were run with NumPy, which is row-major, where the BLAS is Fortran column-major. So upper-transposed became lower-nontransposed (Therefore missing -DLOWER flag for ssyrk_LN could be the reason of the issue).
Please check if my fix solves it for your setup (I tested it with a .c program instead of NumPy tests).
P.S. I also think that except entries in ext_mappings_l3 are ignored so I removed them.
Hi @HaoZeke,
This is an attempt to fix an issue described in https://github.com/HaoZeke/openblas_buildsys_snips/pull/7.
For some of the
ssyrk
variants, e.g.UT
, the returned results are incorrect. One issue that I spotted is that indriver/level3/meson.build
theexcept
entry isn't handled properly (fixed in this PR).As a result all
except
entries inext_mappings
dict were ignored. As a result,syrk
functions used original flags defined inext_mappings
instead of those inext_mappings_l3
.This caused lower-nontranspose to pick these
_LN
flags (again,except
was ignored):As a result
ssyrk_LN
was missing-DLOWER
flag and had-DLEFT
instead.But why the error occurred in
ssyrk_UT
? I suspect that it's because the tests were run with NumPy, which is row-major, where the BLAS is Fortran column-major. So upper-transposed became lower-nontransposed (Therefore missing-DLOWER
flag forssyrk_LN
could be the reason of the issue).Please check if my fix solves it for your setup (I tested it with a
.c
program instead of NumPy tests).P.S. I also think that
except
entries inext_mappings_l3
are ignored so I removed them.