YosysHQ / apicula

Project Apicula 🐝: bitstream documentation for Gowin FPGAs
MIT License
446 stars 64 forks source link

Implement the DSP primitive. #239

Closed yrabbit closed 3 months ago

yrabbit commented 3 months ago

For chips that have these capabilities, a DSP implementation has been added in the form of all the primitives described in the Gowin documentation (UG287-1.3.3E_Gowin Digital Signal Processing (DSP) User Guide), namely:

The most complex but also the most useful is the MULTADDALU18X18 primitive - it allows you to easily make a typical FIR filter, while all connections between these primitives in the chain will be implemented by direct fixed wires with minimal delay.

MULT36X36 are not combined into chains, but they have a different task - this primitive can be found in Linux SOCs.

Added examples (in the examples/himbaechel directory) that are based on the tiny Riscv demonstrating calculations using UART. Only the TXD pin is used (can be found in the specific .CST file for each board), so on the large computer side, only GND and RXD are enough. Port speed 115200, no parity, 8 data bits, 1 stop bit, linefeed only.

Picocom launch example:

picocom -l --imap lfcrlf -b 115200 /dev/ttyU0

The source code for the riscv test programs is provided along with the assembly instructions, but they are not built during the compilation of the examples due to additional compilers.

Implemented the combination of primitives into chains using wires CASO-CASI, SO(A, B)-SI(A, B), as well as SBO-SBI for PADD.

yrabbit commented 3 months ago

This is not a simple thing and the number of combinations of these building blocks is quite large so that I foresee the need for easy correction of detected errors, and therefore the code contains some repetition of pieces.

Non-optimality: as can be seen from the set of primitives, Gowin provides already indivisible combinations (for example Mult and Alu) that are well packaged and connected inside. However, there is one exception: pre-adders. PADD18 and PADD9 exist as separate primitives and that's how we currently code them, but it's actually a small piece inside the DSP block. Gowin packages them with other primitives, we don't currently.

So keep in mind that when using pre-adders, additional delay may occur and more DSP blocks will definitely be occupied compared to Gowin.

We will solve this issue in the next version of the DSP with a more sophisticated packaging algorithm when we are convinced of the correct functioning of the primitives in principle.

pepijndevos commented 3 months ago

Super exciting! Hopefully I can find some time to test and review soon.

pepijndevos commented 3 months ago

Another fun challenge is going to be to teach Yosys about these cells. For reference, this seems to be how ecp5 handles it https://github.com/YosysHQ/yosys/blob/b9d3bffda5abcbc5356936a7192c4a3c2b427c3e/techlibs/ecp5/synth_ecp5.cc#L298-L302

what's a bit weird to me is why they seem to map $mul cells and leave $macc cells alone, while I would think you want to map $macc cells to, well, multiply accumulate cells. But maybe the idea is that the MUL18 is the fundamental building block and the rest is just macro cells?

But I don't think our ALU pass knows about PADD and ALU54 cells, so for stuff like FIR filters, generating some MULTALU cell sound more effective? wdyt @gatecat ?