UCLA-VAST / AutoSA

AutoSA: Polyhedral-Based Systolic Array Compiler
MIT License
191 stars 31 forks source link

Error: unsupported TPExpr function type: NULL #9

Closed hecmay closed 3 years ago

hecmay commented 3 years ago

Hi,

I am trying to make a toy MLP example (two layer) using AutoSA (for FCCM workshop demo). The last dense layer is implemented as a systolic array. I was able to generate a SA with the previous commit (the one that I pulled back two weeks ago). However, after updating to the latest commit, I got the following errors:

[AutoSA] No candidate loops found!
[AutoSA] Apply communication management.
[AutoSA] Error: unsupported TPExpr function type: NULL
[AutoSA] Error: Exit abnormally!

The input c code:

#include <stdio.h>
int main(int argc, char **argv) {

      float L2[1][10];
      float FL[1][64];
      float w2[64][10];
#pragma scop
      for (int j1 = 0; j1 < 10; ++j1) {
        L2[0][j1] = 0.000000e+00f;
        for (int k1 = 0; k1 < 64; ++k1) {
          L2[0][j1] = (L2[0][j1] + (FL[0][k1] * w2[k1][j1]));
        }
      }
#pragma endscop
      printf("%f", L2[0][0]);
      printf("%f", FL[0][0]);
      printf("%f", w2[0][0]);
}

The command I used

cd /usr/src/docker_autosa; ./autosa /usr/src/heterocl/samples/mlp/hcl_autosa_tmp_inst1.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes="{kernel[]->space_time[2];kernel[]->array_part[10,8];kernel[]->latency[2,8];kernel[]->simd[2]}" --simd-info=./autosa_tests/mm_hcl/simd_info.json --hls --hcl --no-data-pack --local-reduce --reduce-op="+" --simd-touch-space --host-serialize
hecmay commented 3 years ago

I was able to generate SA using this commit: 15c877b9fbb206f1a15b58e71428eea58998d889. But the generated code has some minor issues. E.g., the function below is missing the union struct

void L2_1_IO_L3_out_serialize(L2_t2 *L2, hls::stream<float> &fifo_L2_1_local_in) {
#pragma HLS INLINE OFF
  /* Variable Declaration */
  /* Variable Declaration */

  for (ap_uint<4> i = 0; i < 5; i++) {
  #pragma HLS PIPELINE II=1
    L2_t1 fifo_data;
    L2_t2 mem_data;
    ap_uint<32> mem_data_split[2];
    #pragma HLS ARRAY_PARTITION variable=mem_data_split complete
    for (ap_uint<2> p = 0; p < 2; p++) {
      fifo_data = fifo_L2_1_local_in.read();
      u.ut = fifo_data;
      mem_data_split[n] = ap_uint<32>(u.ui);
    }
    mem_data = (mem_data_split[1], mem_data_split[0]);
    L2[i] = mem_data;
  }
}

I guess I can just revert back and fix those issues manually.

hecmay commented 3 years ago

Also another issue is that it seems that --no-data-pack does not really work. I also tried to restrict the upmost packing size with data-pack-sizes, but it seems not to work. The interface tensors are still packed even if I set the upper limit to 4 byte.

One last question -- is it possible to enable host-serialization only for some ports of a SA module? Since sometimes, we need to have the SA module writing to an on-chip buffer, which will be read later by another function. In such a case, a de-serialization function is not needed.

Sorry for posting too many questions here. I made some efforts to hack the generated HLS code, but it seems that requires non-trivial efforts. Just want to check and see if that is a bit easier to fix it from your side. If not, I can continue hacking the generated HLS code and try to make it work.

hecmay commented 3 years ago

I tried to make the two-level nested loop GEMM example into a three-level nested loop (with outermost loop's trip count=1). The array partition factor and space_time mapping factor is also changed accordingly. I guess this would not help much but I still gave it a try.

With the latest docker image, I still got the following error. Here uis the complete error msg

root@346fa78bad51:/usr/src/docker_autosa# cd /usr/src/docker_autosa; ./autosa /usr/src/heterocl/samples/mlp/hcl_autosa_tmp_inst1.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes="{kernel[]->space_time[2];kernel[]->array_part[10,8];kernel[]->latency[2,8];kernel[]->simd[1]}" --simd-info=./autosa_tests/mm_hcl/simd_info.json --hls --hcl --no-data-pack --local-reduce --reduce-op="+" --simd-touch-space --host-serialize
[AutoSA] Extract RAR dep for the array access: { [S_1[i1, j1, k1] -> __pet_ref_4[]] -> w2[o0, o1] }
[AutoSA] Candidate 0: [1,0,0]
[AutoSA] Extract RAR dep for the array access: { [S_1[i1, j1, k1] -> __pet_ref_3[]] -> FL[o0, o1] }
[AutoSA] Candidate 0: [0,1,0]
[AutoSA] Candidate 1: [1,0,0]
[AutoSA] Found more than one legal RAR deps. Candidate 1 is used by default.
[AutoSA] 3 systolic arrays generated.
[AutoSA] Appy compute management.
[AutoSA] Apply array partitioning.
[AutoSA] Apply latency hiding.
[AutoSA] Apply SIMD vectorization.
[AutoSA] -----------------------------------------------
[AutoSA] Current band member position: 0
[AutoSA] -----------------------------------------------
[AutoSA] Detecting the reduction loop.
[AutoSA] Band member position: 1
[AutoSA] Reduction property: y
[AutoSA] -----------------------------------------------
[AutoSA] Current band member position: 1
[AutoSA] -----------------------------------------------
[AutoSA] Array reference (R): { S_1[i1, j1, k1] -> w2[k1, j1] }
[AutoSA] Layout transform: Permute dim (0) to the innermost
[AutoSA] -----------------------------------------------
[AutoSA] The loop is legal to be vectorized with score: 13.000000
[AutoSA] Layout transformation is required to proceed.
[AutoSA] -----------------------------------------------
[AutoSA] No legal SIMD loop is fonud. SIMD vectorization is skipped.
[AutoSA] Apply communication management.
[AutoSA] Error: Unsupported TPExpr function type: NULL
[AutoSA] Error: Exit abnormally!
whbldhwj commented 3 years ago

Can you also put your test input code each time so that I could reproduce the problem?

hecmay commented 3 years ago

@whbldhwj It is in the description already. I also put the command I used to reproduce the issue.