Improve memory access latency with some form of caching, similar to how L1 and L2 caches in CPUs work.
Some ideas:
In case of larger-than-32b memory channels keep the contents of the DataIn signal. Then if subsequent reads are issued to cells that are under the same physical address then no read should actually happen but another part of the ready DataIn content (which is e.g. 512b) can be used.
Speculatively prefetch the next few cells, so if those will be indeed used they'll be already there? What makes this complicated is that if such a prefetch is executing then any manual memory operation needs to wait for it to finish; so it must happen when there won't be any other memory operation for sure.
Have async versions of SimpleMemory operations. This way e.g. a memory read can be started ASAP and then awaited only when the result is actually needed. Thus the operation can happen in the background while something else is executing, so no waiting is needed. However, still one such operation would be possible at one time.
Improve memory access latency with some form of caching, similar to how L1 and L2 caches in CPUs work.
Some ideas:
This is already done for Vitis.
Jira issue