UIUC-ChenLab / scalehls

A scalable High-Level Synthesis framework on MLIR
Other
220 stars 45 forks source link

"target-spec.ini" missing "reg/ff"? #33

Closed eeriecl closed 2 years ago

eeriecl commented 2 years ago

I see the "target-spec.ini" has following configuration. but curiously missing "reg/ff"?


[specification]
frequency=100MHz
dsp=220
bram=280
lut=13300
zslwyuan commented 2 years ago

I see the "target-spec.ini" has following configuration. but curiously missing "reg/ff"?

[specification]
frequency=100MHz
dsp=220
bram=280
lut=13300

Hi @eeriecl I guess that the reason is that the number of LUT (i.e., LUT with 5 input) is the same as the number of FF on Xilinx Ultrascale/+ devices.

eeriecl commented 2 years ago

Absolutely not~ For Xilinx device after 45nm, the F/F:LUT is 2:1, so this is the "pre-allocated" hardware resource ratio. And further more, this is not the true ratio for a particular application. And there's no LUT-5, there only LUT-4 and LUT-6

hanchenye commented 2 years ago

Hi @eeriecl, currently we only support to estimate the utilization of DSP and BRAM in the QoR estimator.

eeriecl commented 2 years ago

so why there's LUT specification/estimation?

hanchenye commented 2 years ago

I don't think the QoR estimator is consuming the LUT specification or reporting the LUT utilization. The LUT estimation is still an open topic -- several existing papers have investigated different approaches to improve the accuracy of the LUT estimation of Vivado HLS, but due to the optimization in the synthesis stage of downstream tools, such as Vivado, the estimation is still not quite accurate :( We are also slowly improving the QoR estimator, but if you are interested to add the LUT estimating feature, please let me know what I can help!

Oxygen-Chu commented 2 years ago

what if feeding scalehls-generated rtl forward to vitis-hls and run high level synthesis, then back-annotate scalehls' report? so scalehls' estimator is not needed, and the report is relatively accurate.

hanchenye commented 2 years ago

Hi @Oxygen-Chu, this is definitely possible! Actually another paper from our group is triggering Vitis HLS to evaluate discovered design points. The pro of this approach is Vitis HLS is more accurate and comprehensive. However, the con is Vitis HLS takes at least 1-2 minutes to compile each design point (some complicated ones need more) while our estimator only needs like seconds.

We stated this in the ScaleHLS paper and made the trade-off here. However, I believe triggering Vitis HLS to guide the design space exploration of ScaleHLS is a valuable topic to investigate (worth a new paper) 😄

Oxygen-Chu commented 2 years ago

i've tried scalehls, autosa, autobridge, merlin-compiler and some other polyhedral compilers these month, and found huge space to improve. simply speaking, let a particular hls tool doing things alone, is far less accurate that co-working with vitis-hls. you talk about 1~2 waiting minutes, but i think it's absolutely worthy, since vitis-hls knows fpga architecture much better than any home-made estimators by far.

hanchenye commented 2 years ago

As we stated in the ScaleHLS paper:

The RTL generation downstream tools, such as Vivado HLS, can take minutes to hours to complete the compilation and to report the synthesis results, which (1) limits the total number of design points that can be evaluated during DSE, thus results in sub-optimal solutions and (2) significantly increases the DSE time to up to tens of hours.

Take a GEMM32 kernel as example, ScaleHLS can open a design space containing about 5 thousands design points. 1-2 minutes waiting time means the DSE can only explore less than 60 of them per hour, which can easily lead to incomprehensive DSE or long DSE time. We are having some on-going HLS projects in MLIR, such as CIRCT-HLS, which will help to reduce the compilation time to RTL in the future, making it more feasible to be used in the DSE.

Oxygen-Chu commented 2 years ago

the dse engine can use multi-core, multi-threading, since each searching-space is absolutely independent. and cpu/ddr/ssd is become more and more stronger and cheaper. my workstation consists of xeon-w3375, 1tb-ddr4, 4tb-980pro, fully satisfy multi-tasking space searching

hanchenye commented 2 years ago

@Oxygen-Chu Please see my early post:

This is definitely possible! Actually another paper from our group is triggering Vitis HLS to evaluate discovered design points ... I believe triggering Vitis HLS to guide the design space exploration of ScaleHLS is a valuable topic to investigate.

We did enable multi-threading in the recent development of that project (has not been open-sourced). As I mentioned, I believe cooperating with Vitis HLS is a valuable direction. Please try this approach if you are interested and I'd love to chat more on this.

Oxygen-Chu commented 2 years ago

@Oxygen-Chu Please see my early post:

We did enable multi-threading in the recent development of that project (has not been open-sourced). As I mentioned, I believe cooperating with Vitis HLS is a valuable direction. Please try this approach if you are interested and I'd love to chat more on this.

ok, i'd like to try more after solving issues ticketed by eeriecl (actually me :-> ) if you update, please let me know