flame / blis

BLAS-like Library Instantiation Software Framework
Other
2.31k stars 367 forks source link

Is there a document on setting auxinfo_t for calling microkernels? #378

Closed xrq-phys closed 4 years ago

xrq-phys commented 4 years ago

I'm recently trying to utilize BLIS microkernels for implementing some antisymmetric matrix operations. I read the KernelsHowTo.md but still have no idea on how to tag the packed memory and call ((dgemm_ukr_ft)bli_cntx_get_l3_nat_ukr_dt(...))(k, &alpha, ...) correctly.

If NULL is passed to auxinfo_t BLIS seems to give a correct answer but bad memory access might occur depending on type of the kernel.

My test code

fgvanzee commented 4 years ago

@xrq-phys Sorry for the delay in response.

It sounds like you are using BLIS for a very advanced purpose. I applaud you for attempting to utilize the low-level building blocks in BLIS to do something entirely new. :)

Note that you asked about packed memory first, and then about proper use of auxinfo_t, which is a different issue. As for packing the memory, it may actually be useful to write your own code to do the packing since the code in BLIS is quite generalized and has many parameters. You can try to recycle some of the lower-level packing functions such as the packing kernels, which may be queried via the bli_cntx_get_packm_ker_dt( dt, ker_id, cntx ) function, which is shown in bli_packm_cxk.c.

You also didn't mention what subconfiguration you are using, which will affect which kernel is being executed. Some of the kernels don't actually use any of the fields in auxinfo_t, but some of them need the pointer to point to a valid struct.

xrq-phys commented 4 years ago

Thanks a lot for the reply!

I'm mainly trying to utilize the haswell, skx and armv8 kernels (specifically bli_dgemm_haswell_asm_6x8, bli_dgemm_skx_asm_16x14 and bli_dgemm_armv8a_asm_6x8).

In armv8 cases it worked smoothly and I confirmed in the assembly that auxinfo_t is not even referenced, but in hasswell and skx cases it fails very often (because my hand crafted auxinfo_t is incorrect, I think).

I'm afraid I'm unable to write in x86_64 assembler at the moment (sorry), I'll try looking at the reference returned by bli_cntx_get_packm_ker_dt but I'd very much appreciate it if some general rules for setting auxinfo->ps, auxinfo->schema and auxinfo->next could be made available.

Both A and B are packed in default way specified by KernelsHowTo.md, i.e. A in column-major and B in row-major for operation C += AB.

fgvanzee commented 4 years ago

@xrq-phys Once again, sorry for the delay. We've been struggling with some high-priority items lately.

In armv8 cases it worked smoothly and I confirmed in the assembly that auxinfo_t is not even referenced, but in hasswell and skx cases it fails very often (because my hand crafted auxinfo_t is incorrect, I think).

I'm still not convinced that your auxinfo_t is the problem. The dgemm kernels in kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c and kernels/skx/3/bli_dgemm_skx_asm_16x14.c do not dereference the auxinfo_t pointer that is passed in. Even though the idea is for the calling code to always pass auxiliary information into the microkernel, those particular microkernels are choosing to not make use of that information. So I don't think that passing NULL into the microkernel function (at least for those two kernels) can be the cause of any errors. I confirmed this by hard-coding NULL to be passed into the microkernel in frame/3/gemm/bli_gemm_ker_var2.c, configuring for haswell, and running make; make check. All tests passed.

I'm afraid I'm unable to write in x86_64 assembler at the moment (sorry), I'll try looking at the reference returned by bli_cntx_get_packm_ker_dt but I'd very much appreciate it if some general rules for setting auxinfo->ps, auxinfo->schema and auxinfo->next could be made available.

We have APIs for setting/querying the information in an auxinfo_t. You may find these APIs in frame/base/bli_auxinfo.h. (Note that they are static functions instead of true functions, so they exist only in the header file.) Here is a quick rundown:

xrq-phys commented 4 years ago

@fgvanzee Thanks a lot for your reply!

I'm quite ashamed to say but reason of the segmentation fault came out to be my mishandling of C functional pointer in C++. The test passed in all 3 platforms after wrapping calls of gemm kernels into extern "C" blocks.

Here is a quick rundown: ...

Thank you very much for the information! I'm in fact working with both double and complex<double> (i.e. double complex) so I'll take a look on the is parameters as well.

fgvanzee commented 4 years ago

@xrq-phys Glad you were able to figure out your problem!