google / CFU-Playground

Want a faster ML processor? Do it yourself! -- A framework for playing with custom opcodes to accelerate TensorFlow Lite for Microcontrollers (TFLM). . . . . . Online tutorial: https://google.github.io/CFU-Playground/ For reference docs, see the link below.
http://cfu-playground.rtfd.io/
Apache License 2.0
463 stars 117 forks source link

Keyword Spotting doesn't fit on Fomu with current Yosys due to memory R/W check (but there's a fix) #628

Open tcal-x opened 2 years ago

tcal-x commented 2 years ago

After the recent dependency bump (#619), I thought that it caused the KWS project to no longer fit on Fomu (over the limit by ~80 LCs). But then I was puzzled that the Fomu CI job was still passing.

I found that the actual difference was which Yosys was being used. In CI (and also running locally after a normal setup), the build actually uses a Yosys v0.14 binary directly downloaded (this is being removed: #623). But during my testing, I had removed the local Yosys v0.14, which resulted in the build instead using the Conda-provided Yosys v0.19, which resulted in significantly higher LC count.

The increase in LC count is not because of worse optimization; it's because of a new check for simultaneous writes/reads to a memory block, which adds extra logic to ensure proper semantics. There is some discussion about it here: https://github.com/YosysHQ/yosys/pull/3351. If the check and extra logic is not needed, because you know that the design will never read and write the same address during the same cycle, you can add the attribute (* no_rw_check *).

When I add this attribute to the regfile and ICache memory blocks in VexRiscv, the LC count drops to 100 LCs under the number available. I will check with Charles that this is ok, and if so, how to get the attribute added to VexRiscv verilog generation.

tcal-x commented 2 years ago

More discussion about the Yosys change here: https://github.com/YosysHQ/yosys/issues/3370