RAPcores / rapcores

Robotic Application Processor
http://rapcores.org/rapcores/
ISC License
24 stars 6 forks source link

Register Layouts and Parametrics: Blocked Regions #218

Open sjkelly opened 3 years ago

sjkelly commented 3 years ago

With the base register map work in main, there is still much to do around the documentation and bindings for this work. One issue that seems nice in practice, quickly becomes a headache is parametric register layouts.

Parametric register layouts are not something that helps interoperability towards any end right now. In theory there is some argument about memory savings and so forth. But I think in practice this is minimal advantage, as I will describe below.

For simplicity we add a new parameter for max_channels, and cap it at 16 or 32 for example. This way we "block" out memory space for up to a maximum supported motor channel amount. This still allows for some degree of parametrics, but it will be far simpler, and can likely be a constant for a given version series. For different and custom deployments it can easily be changed, assuming headers and so forth are regenerated. Blocking out channel memory should have minimal effect on LUT count, and may only start posing an issue if we are in 32 bit systems close to 4GB RAM (at which point most systems go 64 bit, so rare case to bend over backwards for).

Argument as to why memory savings argument is wrong: Take the statically compiled case. If we set e.g. four motors as the configuration, we can still achieve memory savings even in the blocked cases, by only allocating arrays needed for our motor subset. The only potential issue is we don't have one-to-one memory mapping anymore, but I suspect 99.9% of transactions will happen in the command channel, so it is completely moot. We still achieve the memory savings and so forth.

In deployment with a lot of system RAM (order of gigs), we are only asking for around 2Mb for a one-to-one memory map, which is not a lot.

In reality this is likely a question of where do we want to punt the complexity. Right now we spread it over the whole stack. If we block out multi-channel regions, we move this out of the hardware and into the software (if it is desired). For SMBCs, Python, ROS, we don't have to do anything and have made our life easier. For embedded deployments there is some lost optimization in one-to-one memory maps. This is a classic memory/compute tradeoff. Fortunately the scales of data handled here are so small, I think we will be fine in the long run.

Addendum: The complexity in SPI embedded systems is mostly status and config registers due to loosing a one-to-one map if trying to save memory. Systems where we are soft-core, eFPGA SoC, etc, we don't have any worries here because we are sharing memory space, and any issues are only going to be noticed in 32Bit/4Gb systems mentioned above. As for address width, we may run past 8 bits for CSR using a block approach. So far the only system I have seen with 8 bit address and 32 bit value lines are the QuickLogic. I suspect this is not common, and something that will pose a general issue when trying to target eFPGA-SoC.

So the ask is: What is an acceptable upper bound on motors to support?

sjkelly commented 3 years ago

I am calling these "blocks", but "regions" is also a common term.

sjkelly commented 3 years ago

It seems the consensus was that 32 motors should be a sufficient upper bound for motor channel memory regions. This is not a hard limit, but an attempt at ensuring status and config registers can be common for any device under this threshold. Commands and telemetry will still be resized based on the motor count. cc @tonokip @johnnyr

johnnyr commented 3 years ago

Yes. That is what I heard also.

sjkelly commented 3 years ago

Update: 32 motors and 64 encoders. Allowing 2 encoders per channel for series elastic actuators or the "shaft"/"effector" dichotomy.

sjkelly commented 3 years ago

Some other verbiage that may have helped my initial explanation is virtual vs physical memory. By growing our register allocations, regardless of use, we need more virtual memory space.