BLAS evaluation - Githubissues

Taking inspiration from this paper, implement some level 1 and 2 BLAS kernels and parameterize them similarly. Specifically:

[ ] a GEMV implementation that is parameterized over matrix tiling and hardware reuse
[x] a DOT implementation that is parameterized over hardware reuse
[x] a SCAL implementation that is parameterized over hardware reuse (maybe redundant?)
[x] an AXPY implementation that is parameterized over hardware reuse

If time permitting, also show that modules can be chained together using output parameters. The linked paper uses FIFOs to make some computations blocking in their compositions, but since this isn't possible in Filament we can just use output parameters instead.

These implementations seem straightforward, so that we have an easy way to show that Filament can be used for design space exploration.

cucapra / filament

BLAS evaluation #411