MIB Extension Proposal - Githubissues

amcgee commented 10 years ago

I would like to propose the following extension to the MIB protocol. This extension is intended to support two additional features:

Arbitrary buffer sizes for both RPC arguments and return values.
Asynchronous callback execution to prevent the bus from locking up if an RPC handler runs for a long time (i.e. turning on the GSM module and acquiring a network signal can take more than 2 minutes, which essentially hamstrings all the other modules for an unreasonable amount of time)

There are a few important considerations in this protocol extension, namely:

Message Size Limitations

Since some chips have very limited RAM, supporting arbitrarily large message sizes can be challenging. To accomodate this limitation, I propose that during the initial RPC handshake the two parties should agree on a maximum "chunk size" to be used when sending the buffer. The caller then sends the message in chunks of this size, waiting for the callee to process each chunk before proceeding.

This could theoretically be done asynchronously as well so as not to lock the bus during message transmission, but this significantly increases the protocol overhead and I think it is more reasonable to expect developers to design the module APIs in a way that minimizes data chunk processing time.

On another note, supporting arbitrary buffer sizes would be problematic with the existing RPC Queue feature on the controller. I propose having a "pointer"-style queue, where each queued RPC's data buffer is saved to flash (if it's long enough), with the first 16 or so bytes actually stored in the in-memory queue structure. The controller can load each subsequent chunk of the stored buffer into memory while the slave is processing the previous one.

In the existing MIB protocol, the param_spec byte can specify up to 3 integer parameters and a buffer parameter of size up to 31 bytes. This is insufficient for our purposes, and I don't think the parameter typing is particularly useful (we already "hack" it multiple places to pass more than 3 integers). I think the callback definition API can grab "typed" arguments even if the parameters are untyped at the MIB protocol layer. Very very basic type-checking can be done by just confirming that the RPC's parameter buffer size matches what the callback expects (i.e. 4 bytes for 2 integer parameters). Removing this protocol-layer type enforcement means we have a full 8 bits to specify message length, bumping our maximum to 255 which seems reasonable.

Callback state

Supporting asynchronous RPCs requires state about previous MIB calls to be stored by the individual modules. The callee needs to know what callback address to hit when the handler has finished doing its thing. This should be straightforward on the 16-bit chips because they already have state in the RPC queue (it will just need to support removal of elements that aren't at the "top" of the queue), and I think 8-bit chips should only need to support one callback at a time. The easiest way to implement this should be to reserve a particular feature and command (say 0xFF:0xFF, or something less banal) as the universal "callback address", and have it expect the first byte of the callback data to be the "identifier" of the RPC as specified by the original call.

The Heart of the Matter

The "arbitrary message length" protocol extension

The caller (master) sends <start write>+(address, feature, command, message_size)+checksum+<repeated start read> onto the bus
The callee (slave at the specified address) receivs this, checks to make sure the feature/command is legit. If this check succeeds, it responds with (max_chunk_size)+checksum, otherwise (0,error_code)+checksum
Now the caller knows the max chunk size supported by the callee. It then starts sending chunks of size min(caller_max_chunk_size, callee_max_chunk_size). A chunk with size n look like this: <r. start write>+(data[n])+checksum+<r. start read>.
The slave receives this chunk, buffers it in memory, then performs a clock stretch while it is processed. A syntax for defining the "process chunk" handler is proposed below.
When the slave is done processing, it releases the clock and sends an ack/nack byte.
(4) and (5) are repeated until the entire message has been sent. Since the callee knows the entire message size, it should now know that the message phase is complete. If it does not, it can determine as much by noticing that the caller follows the last ack/nack with a <r. start read> (below) instead of <r. start write> (above).
At this point the entire arbitrary-length message had been received and the Async extension (below) takes over.

After the RPC handler has executed, the return logic could also use this same method to support arbitrary return value buffer size.

The "asynchronous RPC" protocol extension

Once the RPC call has been made, it would traditionally be the callee's turn to execute the callback and specify a return status+value. This can still supported by the extension (advantageous for handlers that run quickly and would suffer from the additional overhead of orchestrating the callback).

After the initial call specifications and parameter buffer have been sent (above), the caller sends a <repeated start read>
The slave can now start executing the RPC handler, but if it is taking a while (or this is predetermined) then it can send a "pending" return status. If the handler finishes, the return status is known and no callback negotiation is necessary. Instead of a bit-packed return status + return value length, the callee should always send a full byte of return status. This allows for more fine-grained status codes (including some reserved as handler-specified error codes) and arbitrarily long response buffers.
If the return status was anything but "pending", the callee now sends the return value size followed by the return value itself, chunked as above (the chunk size is known, so it shouldn't need to be re-negotiated. Otherwise, the following steps are performed
Now that we know the return value is pending, the caller needs to give the callee a way to notify it of completion later. The caller sends <r. start write>(origin_address, rpc_identifier)+checksum, where origin_address is the caller's i2c address and rpc_identifier is the ID the caller has assigned to this "pending" call.
The callee stores these values and waits for the handler to finish, and the caller releases the bus.
(some time passes, while other RPCs can be executing)
Now roles are reversed, and the former callee makes a new RPC call to the former caller, specifying (origin_address, callback_feature, callback_command) where callback_feature and callback_command are well-known, and specifying the rpc_identifier as the first byte of the data buffer.
The callback handler should never return a "pending" status, though if this were allowed it might be able to support a possibly interesting "shared session" feature. Or maybe not.
Handler Definition Syntax

To facilitate processing long messages in small-sized chunks, the following C API could be used (written assuming 16-bit compiler, it would have to be optimized for 8-bit:

typedef struct {
    uint8 length;
    BYTE* data;
} MIB_BUFFER;

uint8 mib_read_integer(); // reads 2 bytes from the beginning of the data buffer, and increments the data buffer pointer.  If this reads the last byte of cached data, receives a new chunk from the wire.
MIB_BUFFER mib_read_buffer_chunk(); // returns a pointer to the cached buffer and the length of that buffer.
bool mib_next_buffer_chunk(); // collects the next chunk of data off the wire.  Returns false if no more chunks exist.
void force_synchronous_handler(); // Otherwise we will automatically go async when the last chunk has been sent.

void mib_handler(void) { // Arguments are not passed directly to the function
    // First, read some integer parameters.  These can automatically perform
    //  chunking in the background and change 
    uint8 x = mib_read_integer();
    uint8 y = mib_read_integer();
    while ( mib_next_buffer_chunk() ) {
        MIB_BUFFER buffer = mib_read_buffer_chunk();
        write_to_flash( buffer.data, buffer.length ); // pseudo-function used to exemplify streaming
    }
    // Now we're asynchronous, this task can run as long as it wants.  Ideally there would be an extension to the API to allow such long-running handlers to, i.e., not block the task loop on pic24, but that's a project for another day.
}

And that's it. Let me know if anything is unclear (I'm sure something is).

amcgee commented 10 years ago

@timburke and I discussed this offline and have agreed there's a simpler way to achieve these goals. Our current thinking is this:

Instead of modifying the bus layer to support arbitrarily long buffers, chunked data transfer should happen in multiple packets. This increases the MIB-layer overhead but decreases code complexity. It will have the same effect, but has the potential drawback of allowing for other MIB calls to take over the bus between chunks, which could lead to timing issues if the chunk handler code is sensitive to transmission delays. Conversely, the bus isn't completely locked during these times (though it is still locked while the slave processes a data chunk).
The origin address and callback identifier information should be included in the original packet so it is accessible to the handler. We discussed this allowing MIB endpoints to declare that they do or do not support asynchronous callbacks, but I don't know that we want to do this.
We are considering dropping support for the memory-constrained enhanced midrange 8-bit PICs (pic12) to allow for larger chunk sizes and expanded MIB header sizes.

timburke commented 10 years ago

Per discussion with @amcgee today, we decided the following:

We want to keep the standard MIB packet size of 20 bytes + overhead.
The goal is to have a method that facilitates the streaming of large(r) amounts of data without having to deal with it at the MIB handler level since that's a pain. We discussed having a MIB api call named something like

uint8 bulk_receive(void *buffer, uint8 length, void (*handler)(void *, uint8))

A master would call a slave endpoint, which would internally prepare a buffer to receive the largest chunk of data it can handle and call bulk_receive with the buffer pointer and size. The MIB executive would internally store the buffer location and length. Subsequent calls to a standard bulk_transfer endpoint would load in 20 bytes at a time into the buffer and once it was full, the handler would be called. The filling would happen without application code intervention on the slave side. If the slave wanted more data it could recall bulk_receive from the handler code. Since this is happening with the clocked stretched, the master doesn't need to know anything about how the slave is processing the chunks and can just send chunks until the slave doesn't want anymore or it runs out.

amcgee commented 10 years ago

The other point we didn't discuss was whether or not to discontinue support for the less-capable midrange chips. I think if we continued to support them we would have to reduce the chunk size (since there's additional information in the header and I think the memory bank storing the mib data structure is at capacity on those chips, correct me if I'm wrong), while if we don't continue to support them we could possibly make it bigger (if that's useful)

amcgee commented 9 years ago

I'm going to mock up this change this weekend and open a PR

WellDone / MoMo-Firmware

MIB Extension Proposal #142

Message Size Limitations

Callback state

The Heart of the Matter

The "arbitrary message length" protocol extension

The "asynchronous RPC" protocol extension

Handler Definition Syntax