Open dillonhuff opened 3 years ago
The init values for sch3 and sch4 include negative values. I think that is why you are seeing those values, but you can also check the .json to see if those large values appear (I don't see them).
I recommend reducing the values to a register, which shouldn't have a cycle delay. I don't think there should be hardware limitations on this schedule.
I'm not sure how loading the kernels as inputs would differ from a ROM.
Loading the kernels as inputs would make the array that holds the values of the kernels visible to the scheduler (which is important for reduction) and would get around this coreir generation bug.
Could you please change the schedules for sch3 and sch4 to load the kernel values explicitly?
I changed sch3 and sch4 with memories instead of roms. I think this makes it visible to the schedule. Since the roms do not behave like the other inputs, they can't as easily be added as inputs. Fixed in https://github.com/dillonhuff/clockwork/pull/104.
@jeffsetter thanks!
I pulled and now I get an error finding compute units:
cmd: g++ -fstack-protector-all -std=c++11 regression_tb_unoptimized_harris_sch3_1pp9c.cpp unoptimized_harris_sch3_1pp9c.cpp
ERROR: Could not find Module in namespace!
Module: hcompute_kernel_x_stencil_1
Namespace: global
@jeffsetter if it is easier for you in codegen another solution would be to break up compute units that read from ROMs into two statements: one statement that reads from the ROM and stores the value to a temporary array, and another that reads the value out of the temporary and then performs the purely combinational part of the computation.
That sounds like what I have done with this latest design. The jsons were not being generated due to a timestamp error that was confusing the Makefile. The jsons should be populated in https://github.com/dillonhuff/clockwork/pull/105.
@jeffsetter it seems that the ROM outputs for the 1 pix / 3 cycles and 1 pix / 9 cycles coreir output is still not correct:
In addition the interaction of a 1 cycle delay in the compute unit with reduction will make scheduling and codegen more complicated.
Could you modify the harris variants 3 and 4 to load the kernels as inputs at the start of the accelerator instead of using ROMs?