dillonhuff / clockwork

A polyhedral compiler for hardware accelerators
56 stars 12 forks source link

Mobilenet unrolled too much? #126

Closed jeffsetter closed 4 years ago

jeffsetter commented 4 years ago

I tried getting FPGA power numbers for the mobilenet_unrolled code, but I encounter an error that there is too much IO:

ERROR: [Place 30-415] IO Placement failed due to overutilization. This design contains 416 I/O ports
 while the target  device: 7z020 package: clg484, contains only 330 available user I/O. The target device has 330 usable I/O pins of which 0 are already occupied by user-locked I/Os.

Should we unroll it less?

dillonhuff commented 4 years ago

@jeffsetter I don't think the issue is unrolling of the design itself it is just unrolling of the statements that load the 3x3 filter values, similar to the Harris weight loading issue. For example when I pretty_print mobilenet I see the following long sequence of assignments:

op_hcompute_hw_filter_pw_global_wrapper_stencil: hw_filter_pw_global_wrapper_stencil[0, 0] = hcompute_hw_filter_pw_global_wrapper_stencil(hw_filter_pw_stencil[0, 0])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_1: hw_filter_pw_global_wrapper_stencil[0, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_1(hw_filter_pw_stencil[0, 1])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_2: hw_filter_pw_global_wrapper_stencil[0, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_2(hw_filter_pw_stencil[0, 2])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_3: hw_filter_pw_global_wrapper_stencil[1, 0] = hcompute_hw_filter_pw_global_wrapper_stencil_3(hw_filter_pw_stencil[1, 0])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_4: hw_filter_pw_global_wrapper_stencil[1, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_4(hw_filter_pw_stencil[1, 1])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_5: hw_filter_pw_global_wrapper_stencil[1, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_5(hw_filter_pw_stencil[1, 2])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_6: hw_filter_pw_global_wrapper_stencil[2, 0] = hcompute_hw_filter_pw_global_wrapper_stencil_6(hw_filter_pw_stencil[2, 0])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_7: hw_filter_pw_global_wrapper_stencil[2, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_7(hw_filter_pw_stencil[2, 1])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_8: hw_filter_pw_global_wrapper_stencil[2, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_8(hw_filter_pw_stencil[2, 2])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_9: hw_filter_pw_global_wrapper_stencil[3, 0] = hcompute_hw_filter_pw_global_wrapper_stencil_9(hw_filter_pw_stencil[3, 0])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_10: hw_filter_pw_global_wrapper_stencil[3, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_10(hw_filter_pw_stencil[3, 1])
  op_hcompute_hw_filter_pw_global_wrapper_stencil_11: hw_filter_pw_global_wrapper_stencil[3, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_11(hw_filter_pw_stencil[3, 2])
jeffsetter commented 4 years ago

Okay, let me try rolling the IOs related to the weight initialization.

dillonhuff commented 4 years ago

@jeffsetter I was also curious what the CGRA implementation of this loading should be. Should these values be loaded into datapath registers at configuration time?

jeffsetter commented 4 years ago

I was expecting the weights to be loaded into memory tiles. If we were able to control the write_enable signals on the registers, you could load them into datapath registers, but I don't think our CGRA is able to do that.

dillonhuff commented 4 years ago

From looking at the code every weight in the filter needs to be read simultaneously, so I assume that means that each memory bank will need to have only one entry that is used?

dillonhuff commented 4 years ago

@jeffsetter sorry I meant to say "each memory tile will store only one filter value"

jeffsetter commented 4 years ago

Yes, each memory would only store 1 value. If you wanted, you could store 2 and then output 2, since they are multiported, but that's just a tiny optimization.

dillonhuff commented 4 years ago

@jeffsetter makes sense!

dillonhuff commented 4 years ago

@jeffsetter with the most recent PR merged hw_filter_pw_global_wrapper_stencil is still fully unrolled. Was that intentional?

jeffsetter commented 4 years ago

Oh, I'm sorry. I commented out the wrong unrolls. We want hw_filter_pw_global_wrapper_stencil and hw_filter_dw_global_wrapper_stencil to be rolled up. I'll fix that.

jeffsetter commented 4 years ago

Hm, something weird is happening with my testbench when I don't unroll dw, so I'll leave that unrolled. I think rolling up pw is the important part.

dillonhuff commented 4 years ago

@jeffsetter I have updated mobilenet here: https://github.com/dillonhuff/clockwork/blob/master/soda_codes/mobilenet_unrolled/our_code/mobilenet_unrolled.cpp

Please run it through the power flow and LMK if it gets through ASAP. There is not much time to make additional changes, so I really need to know soon if the app gets through.

jeffsetter commented 4 years ago

will do

jeffsetter commented 4 years ago

This worked, and I was able to get the FPGA power. Thanks!