NetFPGA / P4-NetFPGA-public

P4-NetFPGA wiki
103 stars 31 forks source link

How to generate large size lookup engine #32

Open gsankara opened 5 years ago

gsankara commented 5 years ago

Hi,

Direct lookup engine provided by SDNet 2018.1 is limited to a depth of 64k entries. I need a engine with larger size to hold around a million entries.

Any pointers ?

regs Ganesh

ralfkundel commented 5 years ago

Hi, a table (or any lookup) is realized with on-chip memory (block ram). If you want 1 million entries of each ONLY a single IPv4 address (4byte) and you have no overhead, the total memory requirement will be 4MB. The NetFPGA-SUME is based on a Virtex-7 FPGA which has memory cells of 4 KB. Thus you will need many of them and that's not fesaible for the synthesis tool as they must be accesible in one single clock-cycle. In total, the FPGA has around 53 Mbit=6.6MB of on-chip memory: https://www.xilinx.com/products/silicon-devices/fpga/virtex-7.html#productTable (VX690T) To summarize: 1 million is definitely not possible.

A workaround might be: write your own lookup table based on external DRAM in Verilog/VHDL and integrate it as P4 external function. However, this will limit the bandwidth (lookups/s) as the external memory has high latencies and a low bandwidth (at least for table lookups).

gsankara commented 5 years ago

Hi Ralf,

Thanks for your reply.

I was hoping to use QDR II SDRAM of 72Mb size in NetFPGA specs. Is that usable for lookup ?

regs Ganesh

On Thu, Sep 5, 2019 at 1:25 PM Ralf Kundel notifications@github.com wrote:

Hi, a table (or any lookup) is realized with on-chip memory (block ram). If you want 1 million entries of each ONLY a single IPv4 address (4byte) and you have no overhead, the total memory requirement will be 4MB. The NetFPGA-SUME is based on a Virtex-7 FPGA which has memory cells of 4 KB. Thus you will need many of them and that's not fesaible for the synthesis tool as they must be accesible in one single clock-cycle. In total, the FPGA has around 53 Mbit=6.6MB of on-chip memory: https://www.xilinx.com/products/silicon-devices/fpga/virtex-7.html#productTable (VX690T)

A workaround might be: write your own lookup table based on external DRAM in Verilog/VHDL and integrate it as P4 external function. However, this will limit the bandwidth (lookups/s) as the external memory has high latencies and a low bandwidth (at least for table lookups).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NetFPGA/P4-NetFPGA-public/issues/32?email_source=notifications&email_token=AE6KYXEUY2KECQXYPD5ISB3QIC3NZA5CNFSM4IT2E7QKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD56GLBY#issuecomment-528246151, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6KYXE4JVY2IV3PQHLANB3QIC3NZANCNFSM4IT2E7QA .

ralfkundel commented 5 years ago

Yes, in theory you can use every memory. The only question is: how slow/fast will it be? QDRII memory in general has lower access latencies and is better for loopups.

However: I think you have to implement something by your own as I think (I don't know it), SDNet does not support HLS for external QDRII memory.

regards, Ralf