google / CFU-Playground

Want a faster ML processor? Do it yourself! -- A framework for playing with custom opcodes to accelerate TensorFlow Lite for Microcontrollers (TFLM). . . . . . Online tutorial: https://google.github.io/CFU-Playground/ For reference docs, see the link below.
http://cfu-playground.rtfd.io/
Apache License 2.0
452 stars 116 forks source link

Use of L2 in FPGA setups #790

Open bala122 opened 1 year ago

bala122 commented 1 year ago

Hi @alanvgreen and @tcal-x , I would appreciate it if you could clarify a simple question I need to be answered soon, since I have a deadline: What is the use of the L2 cache in the CFU framework especially on FPGA setups, since as I understand it is inherently the same memory structure as L1 in the device level - ie, single cycle BRAM units. So, does it make sense to have an L2 in the hierarchy or is there some inherently different memory arch. used that consumes lesser resources for L2 per byte. Thanks, Bala.

tcal-x commented 1 year ago

Hi @bala122 , this question is more about the LiteX SoC architecture, so it might be worth asking on the #litex IRC channel (or find them on discord as well these days).

But I can take a guess --- even with the same building blocks, the BRAMs can be organized differently. I expect that the L2 cache can transfer wide data lines to/from the SDRAM with multiple BRAMs in parallel, and then connect to the 32b Wishbone bus. The Wishbone system bus is between the L2 cache and the VexRiscv's L1 caches. The L1 caches are tightly integrated with the VexRiscv's pipeline. So they each serve specific purposes; eliminating either the L1 or L2 cache and putting all of its resources to the other side would likely hurt performance.