In prog: harris_sch2_fourbuf compute unit: hcompute_lgxx_stencil_1 has a mismatch between C++ and coreir

dillonhuff / clockwork

A polyhedral compiler for hardware accelerators

53 stars 11 forks source link

In prog: harris_sch2_fourbuf compute unit: hcompute_lgxx_stencil_1 has a mismatch between C++ and coreir #108

Open dillonhuff opened 3 years ago

dillonhuff commented 3 years ago

@jeffsetter another compute unit inconsistency in a different harris variant:

ERROR in compute unit: hcompute_lgxx_stencil_1
in0_lgxx_stencil[0] -> 236
in0_lgxx_stencil[0] -> 0000000011101100
in1_padded16_global_wrapper_stencil[0] -> 41
in1_padded16_global_wrapper_stencil[0] -> 0000000000101001
in1_padded16_global_wrapper_stencil[1] -> 205
in1_padded16_global_wrapper_stencil[1] -> 0000000011001101
in1_padded16_global_wrapper_stencil[2] -> 186
in1_padded16_global_wrapper_stencil[2] -> 0000000010111010
in1_padded16_global_wrapper_stencil[3] -> 171
in1_padded16_global_wrapper_stencil[3] -> 0000000010101011
in1_padded16_global_wrapper_stencil[4] -> 242
in1_padded16_global_wrapper_stencil[4] -> 0000000011110010
in1_padded16_global_wrapper_stencil[5] -> 251
in1_padded16_global_wrapper_stencil[5] -> 0000000011111011
    coreir_result: 744
    cpp_result   : 232

Could you please reproduce this on your end and fix it?

dillonhuff commented 3 years ago

@jeffsetter the testbench is here: https://github.com/dillonhuff/clockwork/tree/master/compute_unit_regressions/harris_sch2_fourbuf/hcompute_lgxx_stencil_1

jeffsetter commented 3 years ago

I think I figured out the issue with the compute unit, which has to do with signed numbers and shifting. For this application, I never intended numbers to overflow during the multiplication, so I will change the clamp from 255 to 180.

Regarding your testbench, the cpp result is accounting for the overflow, while your coreir result is not. I believe the verilog is incorrect, resulting in the bitshift occurring on an unsigned rather than a signed number. To fix this, I changed the output assignment to: assign out_lgxx_stencil = 16'($signed(in0_lgxx_stencil[0]) + (($signed(smax_289_290_291_out * smax_289_290_291_out)) >>> 16'h0007));

Note the $signed cast on the input.

jeffsetter commented 3 years ago

Another solution could be using wire signed when applicable.

dillonhuff commented 3 years ago

@jeffsetter that makes sense. LMK when you have updated the compute units and I'll retry the regression.

jeffsetter commented 3 years ago

To clarify, the reason why it is failing right now is that the verilog generation is not correct. I would fix that first before we change the clamp values. I suspect this is also why sch1 is not working.

dillonhuff commented 3 years ago

@jeffsetter I cannot fix the verilog generation since that is done by coreir.

@rdaly525 can you take a look at this example?

rdaly525 commented 3 years ago

@dillonhuff Just ran the testbench locally and I cannot reproduce. The randomized verilator test seems to pass

rdaly525 commented 3 years ago

Also the generated coreir and verilog visually looks correct to me.