RAPcores / rapcores

Robotic Application Processor
http://rapcores.org/rapcores/
ISC License
24 stars 6 forks source link

Register Map #205

Closed sjkelly closed 3 years ago

sjkelly commented 3 years ago

With the recent work to use parametric style verilog, I believe there is a stronger case for https://github.com/RAPcores/rapcores/pull/154. The root issue of that approach is that most data was made redundant. Only certain things are true "memories" in RAPcores, namely: DDA buffer, and microstep tables. Everything else is registers, so efficient memory packing and extraction should not save any gates.

However, I suspect that this memory-packed approach is still correct, and may be generalized to a net benefit. To achieve this benefit, the state machine and protocol will likely need to be changed out into a true "register" model that uses this packed memory structure. The key savings will be a slimmed-down FSM that simply writes to a memory address. I've been nibbling around the edges of this idea for a few months now under the auspice of a "async" state machine. The reality is we are already async, but I realize now we may rework the protocol to maintain the same latency and bandwidth balance with a greatly reduced FSM complexity.

The approximate approach is as follow: Header Modifications:

This is more similar to SPI flash interfaces.

Next we will have to:

The combination of these should make the main communication state machine only require a few states, rather than the "state-per-command" as implemented currently. Moreover, values that do no require registers (such as the aforementioned offsets, version numbers, configuration data, etc) will also be elided as memory access (rather than wasting LUT like we do currently).

Of course its hard to completely eliminate the complexity, we just shove it somewhere else. Librapcores will need to handle these offsets. Fortunately though, I think doing this on CPU is a much better place than in gateware as we do now.

sjkelly commented 3 years ago

One issue not explicitly addressed in this proposal is the asymmetric communication, as is done with the fixed point receiving DDA params and transmitting encoder data, two different categories of data. This may be solved by adding separate register offsets for the transmission and return channels.

sjkelly commented 3 years ago

I did some more research on this over the weekend, and was pretty inspired by the Friday OpenFPGA presentations. The wishbone bus transaction state (or any SoC bus) is close to what we want for minimal footprint and device utilization (Address, Data In, Data Out, Read/Write flag, ACK, etc) The difference is we have different event timing with SPI and are missing a dedicated address line. However, an internal wishbone bus is likely the best long-term pathway here for future integration. For example QuickLogic is using OpenFPGA for all their new products, and we would have a simple pathway to becoming HardIP on an eFPGA if we support Wishbone, (disclaimer: I'm a QuickLogic shareholder). Same story for PicoSoC, Caravel, etc...

So the objective is multifold:

Internally, it seems like wishbone is likely the lowest common denominator that is easily achievable, and addresses the above. We could use our own internal bus and treat Wishbone/SPI/etc non-preferentially.

However, and one-to-one mapping of Wishbone bus to SPI will not be a good outcome for latency and bandwidth. We therefore will likely need to implement some form of a compressed protocol in any non-wishbone bus (something like described in my prior comments would work nicely for this). This should have minimal impact on SPI operations, however we will need to put more logic in the SPI layers. Most of the SPI protocol layer should consist of simple offset and read/write handling to the bus based on the header packet, so the complexity should be manageable.

Some possible consequences of SPI interfacing with Wishbone:

Proposed evolution:

sjkelly commented 3 years ago

@tonokip suggests that we make the mode like a DMA on the SPI bus. Rather than specify length we do implicit address increments on each preceding word. The address increment is performed until CS is deasserted.

sjkelly commented 3 years ago

We are heading into murky territory here. This is a good article that echoes much of what I've researched so far: https://olofkindgren.blogspot.com/2016/11/ip-xact-good-bad-and-outright-madness.html

There are some tools to show what is possible: https://airhdl.com

Though AirHDL is AXI, not wishbone.

From a project perspective, I am not seeing many opportunities to automate register maps. For example, few opensource SoC have automated register mapping (e.g. generate C Header, Verilog, and Documentation). That is read as "there are no good open source tools for this", and from what I've read "there are no good tools for this" seems also like an accurate statement.

Three things need to be kept accurate:

Doing this manually for now is likely an acceptable practice so we can get this development moving and at least feel somewhat comfortable with the foundation here. Long term, we will likely have to see how this plays out. I will open up a new issue to track automating this while we move forward with a manual implementation.

sjkelly commented 3 years ago

May be good to also think about how strided access patterns could help here. #212 implements something DMA-like, but the addressing is a mess. WB natively support "granularity" which is like a stride, but for a single transaction.

sjkelly commented 3 years ago

With #212 merged (and improved since my last comments) I think we have a decent handle on this and can move towards Wishbone/SoC deployments.