dillonhuff / clockwork

A polyhedral compiler for hardware accelerators
55 stars 11 forks source link

ROM codegen is wrong? #103

Open dillonhuff opened 3 years ago

dillonhuff commented 3 years ago

@jeffsetter it seems that the ROM outputs for the 1 pix / 3 cycles and 1 pix / 9 cycles coreir output is still not correct:

%Error: harris_sch4_1pp3c.v:3137:98: Too many digits for 16 bit number: 16'd18446744073709551615
 3137 |     .init({16'd1,16'd0,16'd18446744073709551615,16'd2,16'd0,16'd18446744073709551614,16'd1,16'd0,16'd18446744073709551615})
      |                                                                                                  ^~~~~~~~~~~~~~~~~~~~~~~~
%Error: Exiting due to 18 error(s)

In addition the interaction of a 1 cycle delay in the compute unit with reduction will make scheduling and codegen more complicated.

Could you modify the harris variants 3 and 4 to load the kernels as inputs at the start of the accelerator instead of using ROMs?

jeffsetter commented 3 years ago

The init values for sch3 and sch4 include negative values. I think that is why you are seeing those values, but you can also check the .json to see if those large values appear (I don't see them).

jeffsetter commented 3 years ago

I recommend reducing the values to a register, which shouldn't have a cycle delay. I don't think there should be hardware limitations on this schedule.

I'm not sure how loading the kernels as inputs would differ from a ROM.

dillonhuff commented 3 years ago

Loading the kernels as inputs would make the array that holds the values of the kernels visible to the scheduler (which is important for reduction) and would get around this coreir generation bug.

Could you please change the schedules for sch3 and sch4 to load the kernel values explicitly?

jeffsetter commented 3 years ago

I changed sch3 and sch4 with memories instead of roms. I think this makes it visible to the schedule. Since the roms do not behave like the other inputs, they can't as easily be added as inputs. Fixed in https://github.com/dillonhuff/clockwork/pull/104.

dillonhuff commented 3 years ago

@jeffsetter thanks!

I pulled and now I get an error finding compute units:

cmd: g++ -fstack-protector-all -std=c++11 regression_tb_unoptimized_harris_sch3_1pp9c.cpp unoptimized_harris_sch3_1pp9c.cpp
ERROR: Could not find Module in namespace!
  Module: hcompute_kernel_x_stencil_1
  Namespace: global
dillonhuff commented 3 years ago

@jeffsetter if it is easier for you in codegen another solution would be to break up compute units that read from ROMs into two statements: one statement that reads from the ROM and stores the value to a temporary array, and another that reads the value out of the temporary and then performs the purely combinational part of the computation.

jeffsetter commented 3 years ago

That sounds like what I have done with this latest design. The jsons were not being generated due to a timestamp error that was confusing the Makefile. The jsons should be populated in https://github.com/dillonhuff/clockwork/pull/105.