Open sheldonz7 opened 2 weeks ago
A first quick comment. --debug 4 is not working very well. Please use --debug-classes instead. This option raises the debug level for a specific calls/step. An example is --debug-classes= parametric_list_based,cdfc_module_binding, where the debug level is raised to the maximum verbosity for the list-based scheduling and the module binding steps.
The Bambu default is with resource sharing (-C='*'). Sharing the resource may create timing issues. One solution could be to register the input functions. Adding --registered-inputs=yes fixed the timing issue. Another one is to increase the number of resources. In your case, the critical path goes through the function float_adde8m23b_127nih. So you could say -C=float_adde8m23b_127nih=2 to use two fp adders instead of one. The first solution increases the latency, while the second increases the area usage.
Regarding the -s option, it matters only if you have if statements since it activates the code motion/speculation sdc-based scheduling.
Hi, thank you for the clarification! Regarding resource sharing, I'm already using --disable-function-proxy which should permit more than 1 functional units for operations like fadd right? are you saying using -C to enforce the number of specific functional units being generated in the design rather than relying on module binding (which I believe generates the minimum number of functional units that is necessary).
Hi, --disable-function-proxy and -C are two separate options not to be confused.
Function proxies are used to share a hardware function between multiple callers. As an example, say you have three functions A, B, and C, where A and B both call C: with function proxies enabled, the tool will generate a single instance of C and use it for both A and B, while using --disable-function-proxy will result in two dedicated instances of C, one for A and one for B.
Setting the number of functional units for a given operation means that you are allowing the module binding to use a given number of functional units instead of a single one.
Dear Bambu team, When using RTL designs generated by Bambu, I constantly get very tight timing after Vivado implementaion(post route timing report), much worse than what Bambu estimated during its backend run. Setting cprf of less than 1 helps with the Bambu estimation but not the implementation result.
With a clock period constraint of 10ns, this is one sample result i get: cprf = 1 Bambu estimate: 9.836 ns Vivado implementation report: 10.469 ns cprf = 0.7 Bambu estimate: 5.275 ns Vivado implementation report: 9.604 ns
Is this expected or do I have some ways to bring it down? I tried to use pipeline floating point units, which doesn't seem to help.
I tried to switch to speculative sdc scheduling by including the -s option, but i receive errors during run, as provided in this file: stderr.txt
For these experiments, this is the command i use:
this is the C code I used: k2mm.c.txt k2mm.h.txt