[v3.0.0] Sysex Message Encoding

zfields commented 7 years ago

https://github.com/firmata/ConfigurableFirmata/blob/master/src/Encoder7Bit.cpp

Instead of creating an entire class for encoding and decoding, perhaps we could use a different encoding style that has a limited length, but can be statically encoded and decoded.

The private function FirmataMarshaller::transformByteStreamToMessageBytes could work more or less as it does now, except it would insert two bytes at the beginning of the stream to indicate the length [0-16383] (also I would rename it to (wish I had originally named it) FirmataMarshaller::encodeByteStream). Then you would be able to make an analogous FirmataParser::decodeByteStream that would be able to in-place reconstruct the original message, knowing exactly how many bytes it should evaluate to.

16kB (14bits) seems like plenty for a max sysex message length, whereas 128B is simply not enough (doesn't even cover the capability query for an Uno). I'm not sure how MAX_DATA_BYTES (64) fits into this equation, but I feel like it could be handled outside the marshaller and parser.

@soundanalogous I would love to hear your thoughts. This could be implemented rather quickly given the components already in place, and may alleviate any overhead (size and perf) associated with a dedicated class.

SIDE NOTE: This is the way strings used to be implemented (maybe still are) in RPGLE on the IBM iSeries (AS/400).

soundanalogous commented 7 years ago

MAX_DATA_BYTES is the size of the Arduino serial ring buffer for AVR boards. You wouldn't be able to encode an entire capability query response on an AVR for example. So when sending data from a board, you could never buffer more than 64 bytes of data which is why when forming a capability query response for example you just send the bytes immediately (although these are going into a ring buffer in the board-level layers of the Arduino stack).

However you could probably decode an entire capability query response in a Firmata client application since the client side transport buffer is likely much larger (unless it's a BLE transport).

zfields commented 7 years ago

Does this seem worth prototyping? I may do it either way, but didn't get a good read on your level of interest from your last post.

soundanalogous commented 7 years ago

Sure as long as it doesn't require allocating a large array.

soundanalogous commented 7 years ago

@zfields I think you're latest version is pretty good. Should we close this?

zfields commented 7 years ago

No. The latest version does a good job of laying the foundation, but the bits can be packed WAY tighter.

Right now you are at 50% efficiency, and we can get to 87.5% (optimal)

[01234567] is packed as... [_0123456][_xxxxxx7] the optimal packing can get all the way to [_0123456][_78901234][_5678901]...

It would make a noticeable improvement for any string intensive operation (i.e. I2C).

zfields commented 7 years ago

Based on the recent bug #363 this would definitely break compatibility with existing clients an will need to wait for v3.0.0

MasterRoshan commented 1 year ago

I found this interesting, while decoding a string on a client I ended up with null bytes between characters, and the client has to be sure to discard the null bytes. I think the client also has to be sure to encode the null bytes when sending a message.

It brings into question the function of taking an 8-bit byte and encoding it as two "7-bit" bytes, which effectively doubles the size of the message, because the 7-bit bytes are really 8 bits with only 7-bits of data.

I think the concept of a 7-bit byte is taken from MIDI, where the most significant bit is reserved as a "status indicator" where 1 means a new command, and 0 means to reuse the previous command. This can save bandwidth as you don't need to keep sending command bytes. Again these are really 8-bit bytes with 1 bit reserved and 7 bits of data

However, my understanding is that MIDI sysex messages should be able to make full use of 8 bit bytes when encoding/decoding because there is a start and end byte which lets it know that it should not need to parse a status bit, and probably also to know when to stop buffering.

I think it might be a welcome change to keep the original ascii encoding of string data, as client languages are likely to have a way to decode/encode ascii without needing a custom decoder/encoder. But even arbitrary byte arrays could benefit by having their message size cut in half if the original byte is preserved instead of split into two, and seems like you could double the data that can fit in the buffer.

Although something like this would break current clients as mentioned.

firmata / arduino

[v3.0.0] Sysex Message Encoding #352