Large size MM input-stationary SA codegen issue

I am using the latest docker image. I want to create a input-stationary for large MM. Here is what I did

./autosa ./autosa_tests/large/mm/kernel.c --config=./autosa_config/autosa_config.json \
    --target=autosa_hls_c --output-dir=./autosa.tmp/output \
    --sa-sizes="{kernel[]->space_time[2];kernel[]->array_part[260,256,512];kernel[]->latency[20,16];kernel[]->simd[1]}" 
    --simd-info=./autosa_tests/large/mm/simd_info.json \
    --host-serialize --local-reduce --reduce-op="+" --simd-touch-space --no-isl-sink;

# Error in HLS synthesis
===>The following messages were generated while  performing high-level synthesis for kernel: kernel0 Log file: /scratch/users/sx233/FPGA-test/gemm.autosa/temp.autosa.large.mm/_x/kernel0.hw/kernel0/vitis_hls.log :
ERROR: [v++ 214-124] use of undeclared identifier 'fifo_C_1_serialize': /scratch/users/sx233/FPGA-test/gemm.autosa/temp.autosa.large.mm/src/kernel_kernel.cpp:931
ERROR: [v++ 60-300] Failed to build kernel(ip) kernel0, see log for details: /scratch/users/sx233/FPGA-test/gemm.autosa/temp.autosa.large.mm/_x/kernel0.hw/kernel0/vitis_hls.log

Also another question for the PE dimension of the input-stationary SA. For example, if I have a 1024x1024 MatMul kernel, the array part factor is set to [256,256,512], latency hiding factor is [32,32] and SIMD factor is 8. Since the reduction loop is selected as the space loop, I was expecting the SA size to be 512/8 = 64? but in the generated code I can only see 32 PEs. Is that the correct way for calculating the PE dimension for that?

UCLA-VAST / AutoSA

Large size MM input-stationary SA codegen issue #7