Open hominhquan opened 2 years ago
Some of this will naturally be addressed when @devinamatthews obviates the need for the bli_gemm_int()
function, which is on his docket. But yes, we do a lot of aliasing under the assumption that it's cheap.
We could probably get by with aliasing each matrix obj_t
only once, near the very top of the call stack.
Not if you want to be able to use task-based parallelism... However, only aliasing two of the three matrices in each gemm variant is sufficient. This is maybe 30-40% of the current number of aliases?
Some of this will naturally be addressed when @devinamatthews obviates the need for the bli_gemm_int() function, which is on his docket
+1
@devinamatthews As I can see, there is also some aliasing in bli_?_front
, bli_l3_thread_entry
, and bli_?_blk_var?
.
BLIS internal layers are mostly re-cloning and re-aliasing
obj_t a, b, c
each time (bli_?_front, bli_l3_thread_entry, bli_gemm_int
as well asbli_?_blk_var?
). This increases the management overhead (obj_t
aliasing) and consumes a lot of stack, which can be problematic on memory-constrained platforms.Can we take a look if some cloning logic can be relaxed (between multi-threading isolation (must clone) and self-execution of each thread (only clone if required by algorithms)) ?