cucapra / calyx-resource-eval

Resource Usage Evaluation for Calyx (& its Frontends)
0 stars 0 forks source link

AutoSA Setup #11

Open calebmkim opened 12 months ago

calebmkim commented 12 months ago

(Edited 11/4)

Links

Current Workflow (Boring Details that you can skip unless you want to recreate results)

They have a docker image, so I run (I am running docker locally, but I don't think it should make a difference if I do it on Havarti):

docker pull whbldhwj/autosa:latest
docker run -it whbldhwj/autosa

I then change autosa_tests/mm/kernel.h to have a data_t of int (instead of float) and set I,J,K to the dimensions of the systolic array that we want to multiply (You can also change autosa_config.json to change some of the settings to automatic, but I don't think we can do this: we want to have control over the setup more closely.)

The following is the command they give to generate systolic array HLS.

./autosa ./autosa_tests/mm/kernel.c \
--config=./autosa_config/autosa_config.json \
--target=autosa_hls_c \
--output-dir=./autosa.tmp/output \
--sa-sizes="{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}" \
--simd-info=./autosa_tests/mm/simd_info.json \
--host-serialize \
--hls

cp ${AUTOSA_ROOT}/autosa_scripts/hls_scripts/hls_script.tcl autosa.tmp/output/

The directory we want is ${AUTOSA_ROOT}/autosa.tmp/output. I can do docker cp [...] to copy the files where I want, and then run vitis_hls -f hls_script.tcl on Havarti to generate results. I first have to change hls_script.tcl to include the line export_design -format ip_catalog -version 1.1.0 -flow impl, which enables place and route.

Important Choice of input settings

Two things to note about their setup.

  1. Their notion of a PE is slightly different than what I originally thought. Each PE essentially has a "scratch" memory that accumulates its result. Our PE's "scratch" memory is just a single register.
  2. They also have a SIMD instruction, i.e., multiple entires are passed & MAC'ed on each iteration.

Here is an illustration of what's going on.

Screen Shot 2023-10-30 at 1 09 41 PM

What I think Should Happen For (1), I've played around to try to make this "scratch" memory equal to a 1x1 memory (to mimic what we do), but it's giving me an error for some reason. For (2), if you disable simd (i.e., by just making the setting the SIMD dimension = 1, so you're not actually executing on multiple data), I've gotten that to work.

I'm running some tests right now with a few settings I've experimented with.