Closed sa2257 closed 5 years ago
Use #23
gemm ncubed
#include "apcint.h"
It is needed by the host file too. So the header file seems a better place for it.temp
. Reductions on local variables don't throw an error right now. It should make the distinction once reduction operators are implemented.Hi! I don't have a strong feeling about whether MachSuite goes in #23 or its own issue (here). But either way, let's list all the benchmarks. This can help you prioritize the ones to work on next and keep a global view of how far along you are.
For the list of "gaps" indicating what we need before we can fully implement the benchmark, let's try to be as specific as possible and, when it's available, link to the existing issue that tracks that problem. For example, does "Local float variables?" mean that it's impossible to declare local variables with type float
? If so, that probably deserves its own issue (or at least a full explanation here).
Yeah, I'd prefer specific issues that block benchmarks from being written.
Hi, so Rachit prefers to have specific issues listed in #23. I'll repurpose this issue to keep track of MachSuite benchmark implementation.
This post lists language features required to write each MachSuite app in Fuse.
gemm/ncubed
("Naive, O(n^3)
algorithm for dense matrix multiplication."):
gemm/blocked
("A blocked version of matrix multiplication, with better locality."):
gemm/ncubed
fft/strided
("Recursive formulation of the Fast Fourier Transform."):
gemm/ncubed
for(span=FFT_SIZE>>1; span; span>>=1, log++) { ... }
stencil/stencil2d
("A two-dimensional stencil computation, using a 9-point square stencil."):
gemm/ncubed
stencil/stencil3d
("A three-dimensional stencil computation, using a 7-point von Neumann stencil."):
stencil/stencil2d
spmv/ellpack
("Sparse matrix-vector multiplication, using fixed-size neighbor lists."):
gemm/ncubed
in
spmv.fuse`line 22
, Do we support reasoning about this? Also, I haven't yet reasoned out whether we can find parallelism in this.**):spmv/crs
("Sparse matrix-vector multiplication, using variable-length neighbor lists.")
spmv/ellpack
kmp
("The Knuth-Morris-Pratt string matching algorithm."):
stencil/stencil2d
fft/transpose
("A two-level FFT optimized for a small, fixed-size butterfly."):
fft/strided
sort/merge
("The mergesort algorithm, on an integer array."):
stencil/stencil2d
sort/radix
("Sorts an integer array by comparing 4-bits blocks at a time."):
sort/merge
bfs/bulk
("Data-oriented version of breadth-first search."):
sort/merge
struct
; need a good way to represent nodes of a graphbfs/queue
("The “expanding-horizon” version of breadth-first search."):
bfs/bulk
Optimizations Needed gemm/ncubed - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline gemm/blocked - resource allocation (multiplier), memory allocation, array partitioning, loop unroll with tiling, pipeline stencil/stencil2d - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline stencil/stencil3d - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline spmv/ellpack - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline spmv/crs - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline kmp - memory allocation, array partitioning, loop unroll, pipeline bfs/bulk - memory allocation, array partitioning, loop unroll, pipeline bfs/queue - memory allocation, array partitioning, loop unroll, pipeline fft/strided - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline fft/transpose - allocation (multiplier and adder), memory allocation, array partitioning, loop unroll, pipeline sort/merge - memory allocation, array partitioning, loop unroll, pipeline sort/radix - memory allocation, array partitioning, loop unroll, pipeline
Cool! Maybe a big table would be a good way to represent this information? The rows could be benchmarks, and the columns could be features. Then the cell could contain a checkmark if the benchmark needs that feature. (There could be two groups of columns: expressiveness/features and optimizations.) This way, we could easily see what will affect the most benchmarks.
Some things where I could use expanded definitions:
malloc
? If so, do they really need to do this, or is it just a convenience?Okay. Sounds good.
unroll
3 should allow 32 element array access if it's banked by 3 (mismatch with 32) or larger. (we can doctor the array size for simplicity)This table outlines the features needed for the benchmark applications:
Benchmarks | Nested Loops | Reductions | Tiling | Blocking Operations | Index Addition | Filter | while loops |
if conditions |
Bitwise operations | Indirection | unroll |
Banking | Pipelining | Queues | LUTs | Resource Bind | Recursion |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gemm/ncubed |
:white_check_mark: | :white_check_mark: | - | - | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||
spatial/fir |
- | - | |||||||||||||||
gemm/blocked |
:white_check_mark: | :white_check_mark: | :white_check_mark: | - | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||
stencil/stencil2d |
:white_check_mark: | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | |||||||||
stencil/stencil3d |
:white_check_mark: | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||
fft/strided |
:white_check_mark: | :white_check_mark: | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||||
fft/transpose |
:white_check_mark: | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||||
spmv/ellpack |
:white_check_mark: | - | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||||
spmv/crs |
:white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | ||||||||||
kmp/kmp |
:white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | |||||||||
sort/merge |
:white_check_mark: | ||||||||||||||||
sort/radix |
:white_check_mark: | ||||||||||||||||
bfs/bulk |
:white_check_mark: | ||||||||||||||||
bfs/queue |
:white_check_mark: | ||||||||||||||||
spatial/kmeans |
|||||||||||||||||
spatial/gda |
|||||||||||||||||
spatial/bs |
|||||||||||||||||
spatial/pagerank |
|||||||||||||||||
spatial/sw |
|||||||||||||||||
spatial/tq6 |
|||||||||||||||||
md/knn |
|||||||||||||||||
md/grid |
|||||||||||||||||
backprop/backprop |
|||||||||||||||||
nw/nw |
|||||||||||||||||
aes/aes |
|||||||||||||||||
viterbi/viterbi |
I assume a combination of https://github.com/cucapra/fuse-benchmarks/issues/74 and direct issues are being used to track this now. @sa2257 reopen if you still need this.
This is to keep track of the status of MachSuites apps implementation. Subsequent posts will track each app.
Some translations: