This PR updates the joint_matrix based GEMM implementation to match the api from intel/llvm (commit # 2a828f49283145433dc9bbbff74cefcb2d2b10dc).
Removed the deprecated get_wi_data() calls and replaced them with joint_matrix_apply() and joint_matrix_copy() calls.
Changed the leading dimensions for shared memory accesses to avoid excessive bank conflicts.
Refactored the code to write output data to Global Memory in a coalesced way (this wasn't possible earlier due to the joint_matrix_store() operation performed on the Global pointer C)
This PR updates the
joint_matrix
basedGEMM
implementation to match the api fromintel/llvm
(commit #2a828f49283145433dc9bbbff74cefcb2d2b10dc
).get_wi_data()
calls and replaced them withjoint_matrix_apply()
andjoint_matrix_copy()
calls.joint_matrix_store()
operation performed on the Global pointerC
)batch size > 1
.