cucapra / dahlia

Time-sensitive affine types for predictable hardware generation
https://capra.cs.cornell.edu/dahlia
MIT License
134 stars 8 forks source link

MachSuite Tracker #45

Closed sa2257 closed 5 years ago

sa2257 commented 5 years ago

This is to keep track of the status of MachSuites apps implementation. Subsequent posts will track each app.

Some translations:

App C T R Notes
gemm ncubed Yes Simple kernel
stencil2d
gemm block
stencil3d
fft Yes
kmp Yes
radix sort Yes
merge sort
spmv Yes
bfs Yes
md Yes
backprop
nw Yes
aes
viterbi
rachitnigam commented 5 years ago

Use #23

sa2257 commented 5 years ago

gemm ncubed

sampsyo commented 5 years ago

Hi! I don't have a strong feeling about whether MachSuite goes in #23 or its own issue (here). But either way, let's list all the benchmarks. This can help you prioritize the ones to work on next and keep a global view of how far along you are.

For the list of "gaps" indicating what we need before we can fully implement the benchmark, let's try to be as specific as possible and, when it's available, link to the existing issue that tracks that problem. For example, does "Local float variables?" mean that it's impossible to declare local variables with type float? If so, that probably deserves its own issue (or at least a full explanation here).

rachitnigam commented 5 years ago

Yeah, I'd prefer specific issues that block benchmarks from being written.

sa2257 commented 5 years ago

Hi, so Rachit prefers to have specific issues listed in #23. I'll repurpose this issue to keep track of MachSuite benchmark implementation.

sa2257 commented 5 years ago

Language Features Required for MachSuite Apps

This post lists language features required to write each MachSuite app in Fuse.

  1. gemm/ncubed ("Naive, O(n^3) algorithm for dense matrix multiplication."):
    • nested loops
    • reductions
  2. gemm/blocked ("A blocked version of matrix multiplication, with better locality."):
    • the features needed for gemm/ncubed
    • tiling (requires index addition to access arrays; therefore, views are needed)
  3. fft/strided ("Recursive formulation of the Fast Fourier Transform."):
    • the features needed for gemm/ncubed
    • exponential (T: would you mind clarifying what this is @sa2257 ?) (S: The app uses a complex for loop to avoid exponentials. Uncertain whether that's the best strategy for us. Maybe LUTs can be used to hardcode exponentials)
    • serialization
    • bitwise operations
    • the C code has more complex loops than Fuse supports; for example: for(span=FFT_SIZE>>1; span; span>>=1, log++) { ... }
  4. stencil/stencil2d ("A two-dimensional stencil computation, using a 9-point square stencil."):
    • the features needed for gemm/ncubed
    • index addition (views needed?)
    • mismatching matrix multiply (filtering)
  5. stencil/stencil3d ("A three-dimensional stencil computation, using a 7-point von Neumann stencil."):
    • the features needed for stencil/stencil2d
  6. spmv/ellpack ("Sparse matrix-vector multiplication, using fixed-size neighbor lists."):
    • the features needed for gemm/ncubed
    • indirection (T: @sa2257, I'm not sure what this is referring to): (*S: `si := val[j] vec[cols[j]];inspmv.fuse`line 22, Do we support reasoning about this? Also, I haven't yet reasoned out whether we can find parallelism in this.**):
  7. spmv/crs ("Sparse matrix-vector multiplication, using variable-length neighbor lists.")
    • the features needed for spmv/ellpack
    • index addition
  8. kmp ("The Knuth-Morris-Pratt string matching algorithm."):
    • stencil/stencil2d
    • while loops
    • if conditions
  9. fft/transpose ("A two-level FFT optimized for a small, fixed-size butterfly."):
    • the features needed for fft/strided
    • LUTs
  10. sort/merge ("The mergesort algorithm, on an integer array."):
    • the features needed for stencil/stencil2d
    • if methods
    • a method of serialization
    • recursion
  11. sort/radix ("Sorts an integer array by comparing 4-bits blocks at a time."):
    • the features needed for sort/merge
    • bitwise operation support
  12. bfs/bulk ("Data-oriented version of breadth-first search."):
    • sort/merge
    • some kind of "and" type like a struct; need a good way to represent nodes of a graph
  13. bfs/queue ("The “expanding-horizon” version of breadth-first search."):
    • the features needed for bfs/bulk
    • queues

Apps left:

sa2257 commented 5 years ago

Optimizations Needed gemm/ncubed - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline gemm/blocked - resource allocation (multiplier), memory allocation, array partitioning, loop unroll with tiling, pipeline stencil/stencil2d - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline stencil/stencil3d - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline spmv/ellpack - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline spmv/crs - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline kmp - memory allocation, array partitioning, loop unroll, pipeline bfs/bulk - memory allocation, array partitioning, loop unroll, pipeline bfs/queue - memory allocation, array partitioning, loop unroll, pipeline fft/strided - resource allocation (multiplier), memory allocation, array partitioning, loop unroll, pipeline fft/transpose - allocation (multiplier and adder), memory allocation, array partitioning, loop unroll, pipeline sort/merge - memory allocation, array partitioning, loop unroll, pipeline sort/radix - memory allocation, array partitioning, loop unroll, pipeline

sampsyo commented 5 years ago

Cool! Maybe a big table would be a good way to represent this information? The rows could be benchmarks, and the columns could be features. Then the cell could contain a checkmark if the benchmark needs that feature. (There could be two groups of columns: expressiveness/features and optimizations.) This way, we could easily see what will affect the most benchmarks.

Some things where I could use expanded definitions:

sa2257 commented 5 years ago

Okay. Sounds good.

sa2257 commented 5 years ago

This table outlines the features needed for the benchmark applications:

Benchmarks Nested Loops Reductions Tiling Blocking Operations Index Addition Filter while loops if conditions Bitwise operations Indirection unroll Banking Pipelining Queues LUTs Resource Bind Recursion
gemm/ncubed :white_check_mark: :white_check_mark: - - - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
spatial/fir - -
gemm/blocked :white_check_mark: :white_check_mark: :white_check_mark: - - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
stencil/stencil2d :white_check_mark: :white_check_mark: - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
stencil/stencil3d :white_check_mark: :white_check_mark: - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
fft/strided :white_check_mark: :white_check_mark: - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
fft/transpose :white_check_mark: - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
spmv/ellpack :white_check_mark: - :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
spmv/crs :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
kmp/kmp :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
sort/merge :white_check_mark:
sort/radix :white_check_mark:
bfs/bulk :white_check_mark:
bfs/queue :white_check_mark:
spatial/kmeans
spatial/gda
spatial/bs
spatial/pagerank
spatial/sw
spatial/tq6
md/knn
md/grid
backprop/backprop
nw/nw
aes/aes
viterbi/viterbi
rachitnigam commented 5 years ago

I assume a combination of https://github.com/cucapra/fuse-benchmarks/issues/74 and direct issues are being used to track this now. @sa2257 reopen if you still need this.