Closed jeffsetter closed 4 years ago
@jeffsetter I don't think the issue is unrolling of the design itself it is just unrolling of the statements that load the 3x3 filter values, similar to the Harris weight loading issue. For example when I pretty_print
mobilenet I see the following long sequence of assignments:
op_hcompute_hw_filter_pw_global_wrapper_stencil: hw_filter_pw_global_wrapper_stencil[0, 0] = hcompute_hw_filter_pw_global_wrapper_stencil(hw_filter_pw_stencil[0, 0])
op_hcompute_hw_filter_pw_global_wrapper_stencil_1: hw_filter_pw_global_wrapper_stencil[0, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_1(hw_filter_pw_stencil[0, 1])
op_hcompute_hw_filter_pw_global_wrapper_stencil_2: hw_filter_pw_global_wrapper_stencil[0, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_2(hw_filter_pw_stencil[0, 2])
op_hcompute_hw_filter_pw_global_wrapper_stencil_3: hw_filter_pw_global_wrapper_stencil[1, 0] = hcompute_hw_filter_pw_global_wrapper_stencil_3(hw_filter_pw_stencil[1, 0])
op_hcompute_hw_filter_pw_global_wrapper_stencil_4: hw_filter_pw_global_wrapper_stencil[1, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_4(hw_filter_pw_stencil[1, 1])
op_hcompute_hw_filter_pw_global_wrapper_stencil_5: hw_filter_pw_global_wrapper_stencil[1, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_5(hw_filter_pw_stencil[1, 2])
op_hcompute_hw_filter_pw_global_wrapper_stencil_6: hw_filter_pw_global_wrapper_stencil[2, 0] = hcompute_hw_filter_pw_global_wrapper_stencil_6(hw_filter_pw_stencil[2, 0])
op_hcompute_hw_filter_pw_global_wrapper_stencil_7: hw_filter_pw_global_wrapper_stencil[2, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_7(hw_filter_pw_stencil[2, 1])
op_hcompute_hw_filter_pw_global_wrapper_stencil_8: hw_filter_pw_global_wrapper_stencil[2, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_8(hw_filter_pw_stencil[2, 2])
op_hcompute_hw_filter_pw_global_wrapper_stencil_9: hw_filter_pw_global_wrapper_stencil[3, 0] = hcompute_hw_filter_pw_global_wrapper_stencil_9(hw_filter_pw_stencil[3, 0])
op_hcompute_hw_filter_pw_global_wrapper_stencil_10: hw_filter_pw_global_wrapper_stencil[3, 1] = hcompute_hw_filter_pw_global_wrapper_stencil_10(hw_filter_pw_stencil[3, 1])
op_hcompute_hw_filter_pw_global_wrapper_stencil_11: hw_filter_pw_global_wrapper_stencil[3, 2] = hcompute_hw_filter_pw_global_wrapper_stencil_11(hw_filter_pw_stencil[3, 2])
Okay, let me try rolling the IOs related to the weight initialization.
@jeffsetter I was also curious what the CGRA implementation of this loading should be. Should these values be loaded into datapath registers at configuration time?
I was expecting the weights to be loaded into memory tiles. If we were able to control the write_enable signals on the registers, you could load them into datapath registers, but I don't think our CGRA is able to do that.
From looking at the code every weight in the filter needs to be read simultaneously, so I assume that means that each memory bank will need to have only one entry that is used?
@jeffsetter sorry I meant to say "each memory tile will store only one filter value"
Yes, each memory would only store 1 value. If you wanted, you could store 2 and then output 2, since they are multiported, but that's just a tiny optimization.
@jeffsetter makes sense!
@jeffsetter with the most recent PR merged hw_filter_pw_global_wrapper_stencil
is still fully unrolled. Was that intentional?
Oh, I'm sorry. I commented out the wrong unrolls. We want hw_filter_pw_global_wrapper_stencil and hw_filter_dw_global_wrapper_stencil to be rolled up. I'll fix that.
Hm, something weird is happening with my testbench when I don't unroll dw, so I'll leave that unrolled. I think rolling up pw
is the important part.
@jeffsetter I have updated mobilenet here: https://github.com/dillonhuff/clockwork/blob/master/soda_codes/mobilenet_unrolled/our_code/mobilenet_unrolled.cpp
Please run it through the power flow and LMK if it gets through ASAP. There is not much time to make additional changes, so I really need to know soon if the app gets through.
will do
This worked, and I was able to get the FPGA power. Thanks!
I tried getting FPGA power numbers for the mobilenet_unrolled code, but I encounter an error that there is too much IO:
Should we unroll it less?