The VexRiscv SMP cluster is directly connected to LiteDRAM through 2x 128-bit Instruction /Data native LiteDRAM ports:
While doing initial tests with VexRiscv SMP, a bottleneck on LiteDRAM's crossbar has been identified:
The CPUs of the cluster are sharing the same LiteDRAM interface and want potentially to access different banks of the DRAM.
A port can currently only access one bank at a time and has to wait for the command to be emitted on the DRAM bus to switch to another bank (since the BankMachine will lock this port).
The current BankMachine lock mechanism is providing a simple way to avoid data buffering in the crossbar while also ensuring order of the transactions but is now limiting performance and should be improved.
And run the simulation with the traces (--trace), the bottleneck can be observed by looking at the native LiteDRAM port between the VexRiscv SMP cluster and LiteDRAM.
Proposed solution:
To remove this bottleneck, the lock mechanism should probably be removed and others mechanisms introduced for writes and reads:
Write path:
For the write path, each port could maintain cmd_idx and pending_xfers values (up to a N that should be configurable) and for each write:
Send the command to the BankMachine along with the cmd_idx if pending_xfers < N, else wait until condition is satisfied.
Store the write data to a data-width*N memory at cmd_idx location.
Increment the cmd_idx (modulo N) and pending_xfers.
Let the BankMachine retrieve the data from cmd_idx that was passed to it and decrement pending_xfers when BankMachine is accessing the data memory.
Read path:
For the read path, each port could maintain cmd_idx, return_idx and pending_xfers values (up to a N that should be configurable) and for each read:
Send the command to the BankMachine along with the cmd_idx if pending_xfers < N, else wait until condition is satisfied.
Increment the cmd_idx (modulo N) and pending_xfers.
Let the BankMachine return the read data along with the cmd_idx, the data will be written to the returned cmd_idx location.
Return the read data to the port if memory has valid data at return_idx location. Once data is presented and accepted, return_idx memory location should be invalidated, return_idx incremented (modulo N) and pending_xfers decremented.
Identified bottleneck:
The VexRiscv SMP cluster is directly connected to LiteDRAM through 2x 128-bit Instruction /Data native LiteDRAM ports:
While doing initial tests with VexRiscv SMP, a bottleneck on LiteDRAM's crossbar has been identified:
The current BankMachine lock mechanism is providing a simple way to avoid data buffering in the crossbar while also ensuring order of the transactions but is now limiting performance and should be improved.
Reproducing the issue with VexRiscv SMP:
In https://github.com/enjoy-digital/litex_vexriscv_smp apply the following patch to
crt0.S
:And run the simulation with the traces (
--trace
), the bottleneck can be observed by looking at the native LiteDRAM port between the VexRiscv SMP cluster and LiteDRAM.Proposed solution:
To remove this bottleneck, the lock mechanism should probably be removed and others mechanisms introduced for writes and reads:
Write path:
For the write path, each port could maintain
cmd_idx
andpending_xfers
values (up to aN
that should be configurable) and for each write:cmd_idx
ifpending_xfers
< N, else wait until condition is satisfied.data-width*N
memory atcmd_idx
location.cmd_idx
(modulo N) andpending_xfers
.cmd_idx
that was passed to it and decrementpending_xfers
when BankMachine is accessing the data memory.Read path:
For the read path, each port could maintain
cmd_idx
,return_idx
andpending_xfers
values (up to aN
that should be configurable) and for each read:cmd_idx
ifpending_xfers
< N, else wait until condition is satisfied.cmd_idx
(modulo N) andpending_xfers
.cmd_idx
, the data will be written to the returnedcmd_idx
location.return_idx
location. Once data is presented and accepted,return_idx
memory location should be invalidated,return_idx
incremented (modulo N) andpending_xfers
decremented.cc @jedrzejboczar, @dolu1990, @kgugala.