Closed chili-chips-ba closed 2 months ago
Due to mapping and various optimizations during implementation in P&R, it is not possible to keep all signals and names for cross-referencing. However, registers remain identical, and can be found in the *.crf
output. The file is generated automatically after P&R if the +crf
flag is set.
Here is an example: You find the critical path information with highlighted start and end in the P&R log:
The CPE names are made up of the component number (_a110/2
or _a110/OUT2
). You find it in the CRF file as follows:
In the post-synthesis netlist (*_synth.v
), it has the instance name _3208_
, and you will find the flip-flop with reference to downsampler_inst.generalcounter[15]
in your code:
Similarly, we also find the target flip-flop (100/1
or _a100/OUT1
) in the CRF:
In the post-synthesis netlist, you find it as instance _3199_
:
In order to optimizing your critical path, you could now examine the path between thegeneralcounter[{15,6}]
registers in your code and optimize it if necessary.
Thanks very much for your reply.
On Thu, Jul 11, 2024, 2:19 AM Patrick Urban @.***> wrote:
Due to mapping and various optimizations during implementation in P&R, it is not possible to keep all signals and names for cross-referencing. However, registers remain identical, and can be found in the *.crf output. The file is generated automatically after P&R if the +crf flag is set.
Here is an example: You find the critical path information with highlighted start and end in the P&R log: image.png (view on web) https://github.com/chili-chips-ba/openCologne/assets/14027986/a0c46fa0-62ce-4606-b3d9-a4926fafbcfa
The CPE names are made up of the component number (_a) and the CPE part (/1) or (/2). In this example, the starting flip-flop has component number 110, part 2 (110/2 or _a110/OUT2). You find it in the CRF file as follows: image.png (view on web) https://github.com/chili-chips-ba/openCologne/assets/14027986/1a44d24b-7768-446b-8344-389665e2fe2e
In the post-synthesis netlist (*_synth.v), it has the instance name 3208, and you will find the flip-flop with reference to downsampler_inst.generalcounter[15] in your code: image.png (view on web) https://github.com/chili-chips-ba/openCologne/assets/14027986/c15bf2fc-5e71-4ff0-85a6-9957c18f0c1d
Similarly, we also find the target flip-flop (100/1 or _a100/OUT1) in the CRF: image.png (view on web) https://github.com/chili-chips-ba/openCologne/assets/14027986/f7fcb4d4-ce25-406b-bfd6-51db9f7bf303
In the post-synthesis netlist, you find it as instance 3199: image.png (view on web) https://github.com/chili-chips-ba/openCologne/assets/14027986/40982851-1cd4-4dd3-a2c7-6454827e8553
In order to optimizing your critical path, you could now examine the path between thegeneralcounter[{15,6}] registers in your code and optimize it if necessary.
— Reply to this email directly, view it on GitHub https://github.com/chili-chips-ba/openCologne/issues/18#issuecomment-2222122625, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFUATMPO2IR5JTJMGRRJ6DZLYPXLAVCNFSM6AAAAABKWDWKN2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRSGEZDENRSGU . You are receiving this because you were mentioned.Message ID: @.***>
@pu-cc good tips 💯
Still, how do we do random timing queries (such as report_timing
) in the CologneChip framework?
Is there a document that describes scripts and procedures to use if they are not based on the de-facto industry standard SDC?!
@pu-cc - How do we go about specifying timing constraints for GateMate?
From earlier experience (see this, nextpnr
is not very too timing-savvy, if at all (@MikeReznikov for additional comment).
While we expect CologneChip proprietary P_R tool to be better than nextpnr in terms of timing awareness, this is to seek additional info on that topic.
@pu-cc , @DadoCCAG -- Your answers to the above questions have become uber-critical at this point!
We are seeing that PicoRV32, which is the essential element of our TetriSaraj application, does not work properly at 100MHz. There are timing violations in hardware. They are not reported, which is expected, as we currently don't have any clock constraints
in the build.
While we have blindly reduced PicoRV32 clock to 10MHz to "make it work" (or at least so appear) without any timing constraints, we don't know for a fact whether that's sufficiently slow.
Builds without timing constraints are not acceptable in the long run. Moreover, inability to specify timing constraints is simply a showstopper for commercial / professional projects and settings.
I went through the GateMate documentation and found this line:
Furthermore, the netlist is passed to the Place & Route tool for architecture-specific im-
plementation and bitstream generation. A netlist converter generates a generic netlist
from the Yosys or legacy netlist. The first steps of Place & Route comprise procedures for
speed or area optimization before mapping. After placement and routing, the static tim-
ing analysis (STA) might lead to further optimization steps and makes the Place & Route
software an iterative process of constraint-driven re-placement and re-routing steps to
finally achieve user requirements.
In which it says that after P&R there is an STA, but looking through the pages 80-86 of the GateMate FPGA Datasheet there are no options to specify a clock constraint as other FPGA vendors have. And also there is no mention of a clock constraint in their workflow diagram below.
... this calls for some questions:
1) What criteria are used for *constraints-driven placement*
and *constraints-driven routing*
in the situation when even the elementary clock period cannot be specified?!
2) What's the scope of *STA implementation step*
in this context, w/o timing constraints whatsoever?
@pu-cc it's interesting that your own PicoRV32 constraints for GateMate are also alluding to 10MHz clock. Granted, even your CCF has it only as a comment, as opposed to the actual clock constraint
.
Is it that you simply "feel comfortable" with 10MHz, based on your extensive empirical trial-and-error?! Note that PicoRV32 in both Xilinx and Gowin ports of TetriSaraj runs reliably at 100MHz+.
@DadoCCAG, in order for us to compare eduBOS5 GateMate timing performance to that of Xilinx and Gowin, we absolutely need to have a reliable way for specifying timing constraints, i.e. validating timing closure.
Is it that you simply "feel comfortable" with 10MHz [...]
No, not at all. Let me briefly address the most important points:
Placement takes place using the quadratic placement algorithm. After all signals have been routed, p_r always runs an STA. This can also be seen in the log file:
[...]
Static Timing Analysis
Skew violation report using only 80% delay of data path
[...]
STA takes the current placement as a basis and calculates the maximum achievable frequency for all clocks, as I have shown in my first answer. Each clock reports a maximum clock frequency and it's critical path.
Moreover, STA checks for clock skew and applies measures to reduce it.
Once the STA has finished, it should be ensured that the timing for the clock specified in the report is achieved.
In my experiments, picorv and vexcrisv reached about 30-50 Mhz (worst corner).
@pu-cc given that the necessary timing information is available in the P_R database, what would it take to bring the flow from its current reactive* timing closure methodology up to something that at least on surface resembles the mainstream pro-active approach?!
Here is an idea:
1) allow declaration of the basic
clock constraint
in the CCF 2) provide post-processing script that would extract all Fmax reports from the P_R log and compare them to the declared input clock frequencies, flagging violations when below, and displaying the extent of headroom when met 3) in the next phase, build on top of it to add support forgenerated clocks
4) eventually add ability to parse the database and supportreport_timing
command
(*) the current P_R is apparently not timing-driven
. We understand that the P_R is using quadratic placement algorithm.
constraints-driven placement
and constraints-driven routing
properties?
Here is a good question from @TurboVega: