EECS-NTNU / bismo

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing
BSD 3-Clause "New" or "Revised" License
128 stars 29 forks source link

l1, l2 tile sizes are not guaranteed to evenly divide the matrix, is there a fix? #2

Closed yunchenlo closed 4 years ago

yunchenlo commented 5 years ago

In your source code bismo/src/main/cpp/app/BitSerialMatMulExecutor.hpp I found that when the matrix size is too big (e.g. 1024x4096x1), the results of the matrix multiplication will be wrong. Or even cannot pass the assertion. Did you guys find a way to ensure BISMO can support any size of the matrix even if the size is bigger?

Many Thanks, Yun-Chen Lo ================= The comment I found in code is listed as below // TODO l1 tile size is not guaranteed to evenly divide the matrix // due to partial tiles, and same with l2 tile size. need to handle this // either by smart padding/alignment during allocation, or changing tile // upper bounds.

maltanar commented 5 years ago

Hi Yun-Chen,

This BISMO release was intended to demo the hardware only, large parts of the software stack are still missing. The included driver is only there as a small example of what could be done.

In principle there is no limitation to the matrix size, since we can move data from DRAM to the on-chip buffers and the rows/columns/.. can be made divisible by the hardware tile size by adding zero padding, and adding the appropriate loop tiling for instruction generation. However we don't have this at the moment.

yunchenlo commented 5 years ago

Dear @maltanar,

I see. Thank you! However because I need a workable version of drivers for bismo to run any size of mmul in order to do some experiments. Where do you suggest to add zero padding & loop tiling code?

Thanks, Yun-Chen Lo

maltanar commented 5 years ago

The zero padding is actually already handled inside the allocGEMMContext function here - BISMO uses the gemmbitserial library to store bit-serial matrices, and gemmbitserial internally supports zero-padding columns and rows so that they are aligned to desired sizes. For loop tiling, I'd use the build_schedule_trivial in BitSerialMatMulExecutor as a starting point, seeing that there already is some loop tiling there.

maltanar commented 4 years ago

I'm closing this issue since this should no longer be an issue in the new version (i.e. the release in June'19) of BISMO.