Closed xrq-phys closed 4 years ago
@xrq-phys Sorry for the delay in response.
It sounds like you are using BLIS for a very advanced purpose. I applaud you for attempting to utilize the low-level building blocks in BLIS to do something entirely new. :)
Note that you asked about packed memory first, and then about proper use of auxinfo_t
, which is a different issue. As for packing the memory, it may actually be useful to write your own code to do the packing since the code in BLIS is quite generalized and has many parameters. You can try to recycle some of the lower-level packing functions such as the packing kernels, which may be queried via the bli_cntx_get_packm_ker_dt( dt, ker_id, cntx )
function, which is shown in bli_packm_cxk.c
.
You also didn't mention what subconfiguration you are using, which will affect which kernel is being executed. Some of the kernels don't actually use any of the fields in auxinfo_t
, but some of them need the pointer to point to a valid struct
.
Thanks a lot for the reply!
I'm mainly trying to utilize the haswell
, skx
and armv8
kernels (specifically bli_dgemm_haswell_asm_6x8
, bli_dgemm_skx_asm_16x14
and bli_dgemm_armv8a_asm_6x8
).
In armv8
cases it worked smoothly and I confirmed in the assembly that auxinfo_t
is not even referenced, but in hasswell
and skx
cases it fails very often (because my hand crafted auxinfo_t
is incorrect, I think).
I'm afraid I'm unable to write in x86_64 assembler at the moment (sorry), I'll try looking at the reference returned by bli_cntx_get_packm_ker_dt
but I'd very much appreciate it if some general rules for setting auxinfo->ps
, auxinfo->schema
and auxinfo->next
could be made available.
Both A
and B
are packed in default way specified by KernelsHowTo.md
, i.e. A in column-major and B in row-major for operation C += AB
.
@xrq-phys Once again, sorry for the delay. We've been struggling with some high-priority items lately.
In armv8 cases it worked smoothly and I confirmed in the assembly that
auxinfo_t
is not even referenced, but inhasswell
andskx
cases it fails very often (because my hand craftedauxinfo_t
is incorrect, I think).
I'm still not convinced that your auxinfo_t
is the problem. The dgemm
kernels in kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c
and kernels/skx/3/bli_dgemm_skx_asm_16x14.c
do not dereference the auxinfo_t
pointer that is passed in. Even though the idea is for the calling code to always pass auxiliary information into the microkernel, those particular microkernels are choosing to not make use of that information. So I don't think that passing NULL
into the microkernel function (at least for those two kernels) can be the cause of any errors. I confirmed this by hard-coding NULL
to be passed into the microkernel in frame/3/gemm/bli_gemm_ker_var2.c
, configuring for haswell
, and running make; make check
. All tests passed.
I'm afraid I'm unable to write in x86_64 assembler at the moment (sorry), I'll try looking at the reference returned by
bli_cntx_get_packm_ker_dt
but I'd very much appreciate it if some general rules for settingauxinfo->ps
,auxinfo->schema
andauxinfo->next
could be made available.
We have APIs for setting/querying the information in an auxinfo_t
. You may find these APIs in frame/base/bli_auxinfo.h
. (Note that they are static functions instead of true functions, so they exist only in the header file.) Here is a quick rundown:
_schema_a()
, _schema_b()
. These functions relate to the packing schemas used for the left-hand matrix operand (A) and right-hand matrix operand (B). Typically, these are not needed when using "native" assembly kernels. It sounds like you will not need to set them.
_next_a()
, _next_a()
. These functions relate to the pointer/address of the next micropanels of A and B. But the nuance of what "next" means here is important! We don't necessarily mean the next micropanel in memory, but rather the next micropanel that will be used! Oftentimes, it is the next micropanel in memory, except when you get to the end of the packed matrix, at which time you must wrap back around to the first micropanel. So that's the semantic meaning of "next" in this case. The purpose of these values in the auxinfo_t
is for the microkernel to be able to (if it wishes) to prefetch elements of the next micropanel of A and/or B that will be computed upon.
_is_a()
, _is_b()
. These functions relate to the so-called imaginary stride of A and B. They represent the distance, in units of fundamental elements (if your datatype is dcomplex
, its fundamental elements are of type double
), from the real part of element i to the imaginary part of element i. So, for regular "interleaved" storage of complex values, the imaginary stride is 1. Note that these strides are: (a) experimental, (b) used only is select situations, and (c) only relate to certain complex domain microkernels. You may safely ignore them since it appears you are interested in dgemm
.
_ps_a()
, _ps_b()
. These functions relate to the so-called panel stride of A and B. The panel stride is the distance in memory (in units of elements) from the beginning of micropanel i to the beginning of micropanel i+1. Usually, ps_a
is equal to MR * KC
and ps_b
is equal to KC * NR
. Technically, they are PACKMR * KC
and KC * PACKNR
, since the micropanels may have "leading dimensions" that are slightly larger than MR or NR, but in almost all cases, MR = PACKMR
and NR = PACKNR
, so usually the two methods of computing the panel stride are equivalent. Note: Most microkernels will never need to use these values.
@fgvanzee Thanks a lot for your reply!
I'm quite ashamed to say but reason of the segmentation fault came out to be my mishandling of C functional pointer in C++. The test passed in all 3 platforms after wrapping calls of gemm
kernels into extern "C"
blocks.
Here is a quick rundown: ...
Thank you very much for the information! I'm in fact working with both double
and complex<double>
(i.e. double complex
) so I'll take a look on the is
parameters as well.
@xrq-phys Glad you were able to figure out your problem!
I'm recently trying to utilize BLIS microkernels for implementing some antisymmetric matrix operations. I read the KernelsHowTo.md but still have no idea on how to tag the packed memory and call
((dgemm_ukr_ft)bli_cntx_get_l3_nat_ukr_dt(...))(k, &alpha, ...)
correctly.If
NULL
is passed toauxinfo_t
BLIS seems to give a correct answer but bad memory access might occur depending on type of the kernel.My test code