Open gabizon103 opened 9 months ago
One note about #413: A cool thing enabled by being parameteric over the multiplier's (and adder's) timing behavior is we can implement precision optimizations: we can support both floating-point and fixed-point implementations of all the kernels.
Showing off this example might be useful to demonstrate how when doing DSE, there are many different kinds of "correctness" you need to care about and Filament doesn't track all of them (in this case, bit-precision in computations).
Taking inspiration from this paper, implement some level 1 and 2 BLAS kernels and parameterize them similarly. Specifically:
GEMV
implementation that is parameterized over matrix tiling and hardware reuseDOT
implementation that is parameterized over hardware reuseSCAL
implementation that is parameterized over hardware reuse (maybe redundant?)AXPY
implementation that is parameterized over hardware reuseIf time permitting, also show that modules can be chained together using output parameters. The linked paper uses FIFOs to make some computations blocking in their compositions, but since this isn't possible in Filament we can just use output parameters instead.
These implementations seem straightforward, so that we have an easy way to show that Filament can be used for design space exploration.