NetFPGA / P4-NetFPGA-public

P4-NetFPGA wiki
103 stars 31 forks source link

Meeting timing: How to clock down the P4-generated IP? #25

Open AnotherKamila opened 5 years ago

AnotherKamila commented 5 years ago

Hello,

My project is rather large and since the P4 toolchain does not give me enough visibility into / control over the generated code, I am having issues meeting timing despite putting a lot of effort into optimisation of my design. Therefore, I need to lower the clock speed. My questions, therefore, are:

  1. What is the minimum clock speed with which I can still achieve the throughput of 40Gbps? Am I right to think that the SDNet output is engineered for 100Gbps at 200MHz and therefore even going with just a 100MHz clock, it should still be able to handle line rate? (And if that is the case, why do you default to 200MHz?)

  2. How can I decrease the clock speed? Where is it defined? I've seen a flag for the SDNet compiler, but I am not sure how I'll need to change the wrappers to supply a slower clock. Can you please point me to where you supply the clock?

Thank you!

sibanez12 commented 5 years ago

If you're design is failing timing it could be for a number of reasons. The most common two are: (1) your P4 program is too big and is consuming too many resources on the FPGA making it very challenging for Vivado to successfully place & route your design, (2) your extern(s) fail to meet timing.

If you still want the design process packets at 40Gbps then you can't actually reduce the clock rate too much below 200MHz, maybe you can bring it down to 160MHz. The datapath is 256 bits wide so running at a clock rate of 160MHz => 256bits * 160MHz = 41 Gbps. But I suspect that may not help too much. If you really want to try, you can read about how to configure the clock wizard: https://github.com/NetFPGA/P4-NetFPGA-live/blob/master/contrib-projects/sume-sdnet-switch/projects/switch_calc/simple_sume_switch/hw/tcl/simple_sume_switch.tcl#L112

I would strongly recommend that you first look at the Vivado generated timing report and the utilization report to try and understand why the design is failing to meet timing. After you run $ cd $NF_DESIGN_DIR && make the timing report is the following file: $NF_DESIGN_DIR/hw/project/simple_sume_switch.runs/impl_1/top_timing_summary_postroute_physopted.rpt and there's also a utilization report in the same directory. Feel free to post them here is you'd like some help analyzing them.

AnotherKamila commented 5 years ago

Hi,

my design is failing to meet timing mainly because of the parser, as in the parser I had to include a lot of inefficient workarounds for bugs in p4c-sdnet. The externs are quite efficient, even my AES extern is only responsible for 0.4ns on the critical path. You are right that the routing & placing is quite difficult -- due to my project being rather complex (and due to numerous workarounds for p4c), I am using almost the whole area of the FPGA. However, as I have already optimised my design, clocking down to 160MHz should be sufficient -- my WNS is currently about -0.1ns.

I will try that and see where I get. Thank you for the info!

AnotherKamila commented 5 years ago

Hi,

I would like to clock down my design, but I haven't quite figured out the meaning of the parameters in the line you linked to (https://github.com/NetFPGA/P4-NetFPGA-live/blob/master/contrib-projects/sume-sdnet-switch/projects/switch_calc/simple_sume_switch/hw/tcl/simple_sume_switch.tcl#L112) . Is it sufficient to just replace all the 200 with 160, or do the other parameters need to be changed as well?

Thank you very much for the help (and sorry for my ignorance on this, I am not a hardware engineer).