cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 93 forks source link

HCL Smith-Waterman Example: HLS code failed synthesis #437

Open hecmay opened 2 years ago

hecmay commented 2 years ago

Im trying to reproduce some performance numbers mentioned in HCL paper on AWS F1.

===>The following messages were generated while  performing high-level synthesis for kernel: default_function Log file: /heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log :
ERROR: [v++ 200-1471] Stop unrolling loop 'VITIS_LOOP_30_7' (/heterocl/samples/smith_waterman/aws/kernel.cpp:11) in function 'default_function' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.\

ERROR: [v++ 200-70] Pre-synthesis failed.
ERROR: [v++ 60-300] Failed to build kernel(ip) default_function, see log for details: /heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log
ERROR: [v++ 60-773] In '/heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log', caught Tcl error: ERROR: [HLS 200-1471] Stop unrolling loop 'VITIS_LOOP_30_7' (/heterocl/samples/smith_waterman/aws/kernel.cpp:11) in function 'default_function' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body.\
ERROR: [v++ 60-773] In '/heterocl/samples/smith_waterman/aws/_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel/default_function/vitis_hls.log', caught Tcl error: ERROR: [HLS 200-70] Pre-synthesis failed.
ERROR: [v++ 60-599] Kernel compilation failed to complete
ERROR: [v++ 60-592] Failed to finish compilation
INFO: [v++ 60-1653] Closing dispatch client.
make: *** [_x.hw.xilinx_aws-vu9p-f1_shell-v04261818_201920_2/kernel.xo] Error 1

Here is part of generated HLS code from HCL. The root cause, as indicated in the error log, is that the second loop's body is too large to be unrolled. We probably need function outlining for the large loop body here to make it synthesizable


void default_function(ap_uint<3> seqAs[1024][128], ap_uint<3> seqBs[1024][128], ap_uint<3> outAs[1024][256], ap_uint<3> outBs[1024][256]) {
  ap_int<32> B;
  for (ap_int<32> t_outer = 0; t_outer < 32; ++t_outer) {
  #pragma HLS pipeline
    for (ap_int<32> t_inner = 0; t_inner < 32; ++t_inner) {
    #pragma HLS unroll
      ap_int<32> maxtrix_max;
      maxtrix_max = 0;
      ap_int<32> i_max;
      i_max = 0;
      ap_int<32> j_max;
      j_max = 0;
      ap_int<16> matrix[129][129];
      for (ap_int<32> x = 0; x < 129; ++x) {
        for (ap_int<32> y = 0; y < 129; ++y) {
          matrix[x][y] = (ap_int<16>)0;
        }
      }
      // ... omit other code inside the loop body
      // there are many other loop nests inside the second loop's body
  }
}
seanlatias commented 2 years ago

We shouldn't unroll the loop. We should just pipeline it. Do we specify that in the HCL code?

seanlatias commented 2 years ago

We should just move the initialization outside.

seanlatias commented 2 years ago

It's ok to modify the HCL code as long as it makes sense and is still functional.

hecmay commented 2 years ago

We shouldn't unroll the loop. We should just pipeline it. Do we specify that in the HCL code?

In HCL code, the outer loop is pipelined, and inner loop is optimized with parallel(). I am using the pre-generated HLS code inside smith_waterman folder.

So these loops should be moved outside of the top-level loop nests, right?

      ap_int<16> matrix[129][129];
      for (ap_int<32> x = 0; x < 129; ++x) {
        for (ap_int<32> y = 0; y < 129; ++y) {
          matrix[x][y] = (ap_int<16>)0;
        }
      }
      ap_int<16> action[129][129];
      for (ap_int<32> x1 = 0; x1 < 129; ++x1) {
        for (ap_int<32> y1 = 0; y1 < 129; ++y1) {
          action[x1][y1] = (ap_int<16>)3;
        }
      }

I will modify the HLS code first and will update the HCL code once I ensure the HSL can actually work.

hecmay commented 2 years ago

The log from Vitis HLS 2020 is a bit misleading and it points to a random line number and complains that "the loop" in that line cannot be unrolled...

After I switched to Vitis 2019, I figured out that Vitis HLS actually has difficulty unrolling one of the loop nest inside body of the inner loop, which is annotated in the snippet below.

  for (ap_int<32> t_outer = 0; t_outer < 32; ++t_outer) {
    for (ap_int<32> t_inner = 0; t_inner < 32; ++t_inner) {
      #pragma HLS pipeline
      ap_int<32> mutate3;

      for (ap_int<32> i = 0; i < 129; ++i) { // THIS CANNOT BE UNROLLED
        for (ap_int<32> j = 0; j < 129; ++j) {
          ap_int<32> trace_back[4];
          for (ap_int<32> x2 = 0; x2 < 4; ++x2) {
            trace_back[x2] = 0;
          }
seanlatias commented 2 years ago

Do you have the complete HLS generated code?

hecmay commented 2 years ago

I am using this one: https://github.com/cornell-zhang/heterocl/blob/master/samples/smith_waterman/smith_vhls.cl @seanlatias