Open rachitnigam opened 5 years ago
RFC from @sampsyo and @tissue3.
Sorry I still don't know what DSE is.
Design space exploration
Again, sounds just about perfect!
@sampsyo comments on the current heatmaps (permalink):
Wow; pretty weird outlier in the execution time results, huh? But it’s again odd that the execution time is so stable among the other points, even as the unrolling and partitioning changes…
The resource usage indeed goes up as expected but the runtime does not go down. One possible hypothesis is that the benchmark is memory bound -- the data transfer cost outweigh the total runtime of the gemm kernel. Figure out a way to validate this.
Also, note that unlike the misaligned-partition-and-unroll
experiment where the unrolling and partitioning factors increase together and the runtime changes more predictably, this experiment uses single ported memories.
Experiment
Figure out if index expression analysis can catastrophically hurt DSE.
x
andout[1]
. It performs the computationout[0] += x ^ addr
whereaddr
is the matrix address in the innermost loop and also performs the normal computation for GeMM.x
andout[1]
and indexes into the arrays by doingM[x ^ addr]
whereaddr
is the normal indexing expression while also doingout[0] += x ^ addr
.