Current Workflow (Boring Details that you can skip unless you want to recreate results)
They have a docker image, so I run (I am running docker locally, but I don't think it should make a difference if I do it on Havarti):
docker pull whbldhwj/autosa:latest
docker run -it whbldhwj/autosa
I then change autosa_tests/mm/kernel.h to have a data_t of int (instead of float) and set I,J,K to the dimensions of the systolic array that we want to multiply (You can also change autosa_config.json to change some of the settings to automatic, but I don't think we can do this: we want to have control over the setup more closely.)
The following is the command they give to generate systolic array HLS.
The directory we want is ${AUTOSA_ROOT}/autosa.tmp/output. I can do docker cp [...] to copy the files where I want, and then run vitis_hls -f hls_script.tcl on Havarti to generate results. I first have to change hls_script.tcl to include the line export_design -format ip_catalog -version 1.1.0 -flow impl, which enables place and route.
Important Choice of input settings
Two things to note about their setup.
Their notion of a PE is slightly different than what I originally thought. Each PE essentially has a "scratch" memory that accumulates its result. Our PE's "scratch" memory is just a single register.
They also have a SIMD instruction, i.e., multiple entires are passed & MAC'ed on each iteration.
Here is an illustration of what's going on.
What I think Should Happen
For (1), I've played around to try to make this "scratch" memory equal to a 1x1 memory (to mimic what we do), but it's giving me an error for some reason.
For (2), if you disable simd (i.e., by just making the setting the SIMD dimension = 1, so you're not actually executing on multiple data), I've gotten that to work.
I'm running some tests right now with a few settings I've experimented with.
(Edited 11/4)
Links
Current Workflow (Boring Details that you can skip unless you want to recreate results)
They have a docker image, so I run (I am running docker locally, but I don't think it should make a difference if I do it on Havarti):
I then change
autosa_tests/mm/kernel.h
to have adata_t
ofint
(instead offloat
) and setI,J,K
to the dimensions of the systolic array that we want to multiply (You can also change autosa_config.json to change some of the settings to automatic, but I don't think we can do this: we want to have control over the setup more closely.)The following is the command they give to generate systolic array HLS.
The directory we want is
${AUTOSA_ROOT}/autosa.tmp/output
. I can dodocker cp [...]
to copy the files where I want, and then runvitis_hls -f hls_script.tcl
on Havarti to generate results. I first have to changehls_script.tcl
to include the lineexport_design -format ip_catalog -version 1.1.0 -flow impl
, which enables place and route.Important Choice of input settings
Two things to note about their setup.
Here is an illustration of what's going on.
What I think Should Happen For (1), I've played around to try to make this "scratch" memory equal to a 1x1 memory (to mimic what we do), but it's giving me an error for some reason. For (2), if you disable simd (i.e., by just making the setting the SIMD dimension = 1, so you're not actually executing on multiple data), I've gotten that to work.
I'm running some tests right now with a few settings I've experimented with.