adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.14k stars 1.22k forks source link

usb_midi read buffer can overflow without any signal to user code, leading to data corruption. #2245

Open theacodes opened 5 years ago

theacodes commented 5 years ago

In certain cases, the usb_midi receive buffer can overflow which can cause all sorts of unexpected behavior. Presently, there's no way that user can definitively detect and react to this state.

Source of behavior

Currently most boards are configured to have a usb_midi read buf of 128 bytes (source). The buffers are configured as overwritable (source) so if any data comes in once the buffer is full, it will "wrap around" - it will discard the earliest bytes to accommodate the new bytes.

For example, image if you have a buffer of 8 bytes:

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07

And four more bytes arrive, you'd end up with this buffer:

0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x0B

dropping the first four bytes in the buffer.

This can be confirmed with this short program that exercises tusb_fifo:

#include <stdio.h>
#include "tusb_fifo.h"

TU_FIFO_DEF(ff, 8, uint8_t, true);

int main(int argv, char** argc) {
    // Push 12 bytes into the 8 byte buffer.
    for(uint8_t i=0; i < 12; i++) tu_fifo_write(&ff, &i);

    for(uint8_t i=0; i < 8; i++){
        uint8_t c;
        tu_fifo_read(&ff, &c);
        printf("%i\n", c);
    }
}

Effect on MIDI streams

When this occurs for MIDI streams, messages will either be skipped or corrupted.

As some context, a MIDI data stream consists of status bytes (bytes that have the high bit set, so byte & 0x80 must be true) and data bytes (0-127, the highest bit must not be set). The status byte determines what the MIDI message is and how many data bytes should follow.

Imagine a stream of MIDI channel pressure messages. These messages are 2 bytes: a single status byte and a single data byte that gives the amount of pressure. The stream is a repetition of 0xD0 0xNN where NN must be between 0-127. So for example:

0xD0 0x01 0xD0 0x02 0xD0 0x03 0xD0 0x04 0xD0 0x05 0xD0 0x06

If a buffer overflow occurs and causes a wrap around, then this stream will be corrupted in one of two ways:

Skip

If a buffer overflow happens to produce a new valid stream, for example if we add 0xD0 0x07 to the stream above it'll cause a skip to happen:

0xD0 0x02 0xD0 0x03 0xD0 0x04 0xD0 0x05 0xD0 0x06 0xD0 0x07

effectively skipping the first message. The user code may never notice this kind of corruption but it can manifest in ways such as missing/stuck notes or jumpy value changes.

Corruption

If the buffer overflow manages to produce a stream with an invalid sequence, such as putting a data byte where a status byte was expected, then outright stream corruption occurs. For example, imagine if a single byte message such as MIDI "START" (0xFA) being added to our original stream. It would shift the first byte in the stream to being a data byte:

0x01 0xD0 0x02 0xD0 0x03 0xD0 0x04 0xD0 0x05 0xD0 0x06 0xFA

The MIDI parser will be expecting a status byte, not a data byte.

Reproduction

Reproducing this only requires letting the buffer fill up sufficiently to trigger one of the cases. You can accomplish this by sleeping between calls to MidiPortIn.receive. It's hard to detect the skip case, but it's straightforward to detect the corruption case.

You can send a stream of equal-sized MIDI messages (such as CC changes) and just assert that the bytes are appropriately data bytes or status bytes. For example:

portin = usb_midi.ports[0]

def blocking_read_n_bytes(port, dest, num_bytes):
    read_buf = bytearray(1)
    while num_bytes:
        if port.readinto(read_buf, 1):
            dest.append(buf[0])
            num_bytes -= 1

midi_bytes = bytearray(3)
while True:
    ## CC messages a 3 bytes long. 1 status, 2 data bytes.
    blocking_read_n_bytes(portin, midi_bytes, 3)
    assert midi_bytes[0] & 0x80
    assert not midi_bytes[1] & 0x80
    assert not midi_bytes[2] & 0x80

    ## Sleeping lets the midi_rx buffer fill up and cause corruption.
    time.sleep(1)

Prevalence and impact

This issue is likely to occur in many cases. The combination of (1) user code taking time to process between receiving MIDI messages and (2) MIDI senders sending large amounts of data will cause this to happen as people build more sophisticated MIDI projects using CircuitPython.

Mitigation

Both Adafruit_CircuitPython_MIDI (source) and Winterbloom_SmolMIDI (source) have means of detecting and communicating the effects of the corruption state. Adafruit's library will emit a MIDIUnknownEvent or a MIDIBadEvent. Winterbloom's will increment error_count and ignore bytes until it finds the next valid status byte.

This works reasonably well for both libraries, but it does nothing for the skip case and the corruption case can look like a valid stream (for example, overflow leading to the data byte of one message appearing for a different message, leading to things like ghost notes and such). It also almost always leave the buffer in a near-full state, so once this occurs once it's more likely to occur again unless the user code somehow "catches up" on the stream.

Another potential way of mitigating this is to have the libraries de-duplicate channel messages as they are the ones that tend to lead to this overflow state and often the program will react based on the latest state anyway. For example, if the buffer has 20 channel pressure messages in it the MIDI library could ignore all but the last. This might work for some cases but still doesn't solve the underlying issue.

Suggestions for better mitigation

While we can't completely avoid this state, we can do a better job of exposing it to the libraries and user code so that it can be handled appropriately.

Ideally, PortIn would have a flag that indicates that overflow has occurred. The MIDI libraries could then read this flag and allow the user to respond in an application-specific way. We should also have a way to reset the buffer. This has precedence elsewhere, such as the Arduino SoftwareSerial library.

So one new property and one new method for usb_midi.PortIn:

# Bool, True if the buffer has overflowed.
portin.buffer_overflow
# Resets the buffer and clears the overflow flag.
portin.reset()

At the library level there's room for lots of approaches: the library could throw an exception for the user code to handle::

while True:
    try:
        msg = midi_in.receive()
    except MidiBufferOverflow:
        stop_all_notes()
        midi_in.reset()

or just expose it as a property:

while True:
    if midi_in.buffer_overflow:
        stop_all_notes()
        midi_in.reset()

    msg = midi_in.receive()
kevinjwalters commented 5 years ago

I may have asked about this when I was testing adafruit_midi library. I agree being able to detect data loss is a useful step. A cumulative counter is another approach here and that wouldn't require a clear method.

theacodes commented 5 years ago

A cumulative counter is another approach here and that wouldn't require a clear method.

Can you expand on this a little? I'd like to get a better idea of what this would look like from the library's perspective.

We could have a way to clear the overflow flag without resetting the buffer, but from experience that means that the overflow state is likely to re-occur quickly.

CedarGroveStudios commented 5 years ago

And if there's a function to clear the buffer, it would be helpful to have an option to clear all up to the latest full message packet to keep things synchronized.

theacodes commented 5 years ago

That's an interesting idea. Currently, usb_midi is functionality just a serial port, it doesn't know anything about the actual data coming in. We could make it clear up until the next status byte without needing it to know too much about the MIDI protocol.

CedarGroveStudios commented 5 years ago

Exactly. Normally, I'd be nervous dropping messages, but given the alternative... Could also issue a MIDI reset message just before resuming, I suppose.

theacodes commented 5 years ago

That's why it's critical to have the application know about this state and determine what to do about it. In my examples above, I used stop_all_notes as an example of one way of the application dealing with this.

CedarGroveStudios commented 5 years ago

I've noticed that resetting notes or issuing any other new overriding message when the buffer fills during a multi-byte message can be problematic, as well.

theacodes commented 5 years ago

Unless there's any objection, I'll likely start working on this in the next couple of weeks. I'll likely have to make changes in tinyusb, first.

kevinjwalters commented 5 years ago

When I was referring to a clear method I was thinking that midi_in.buffer_overflow would need somehow to be cleared but your example code suggests when you read the value and it is true then it auto-clears the buffer_overflow property so the next time it's read it's false?

On the counter front, I was thinking about the classic model seen on networking stacks. E.g. IP has a wide range of counters like Udp: receive buffer errors. If the application isn't that interested in loss it can ignore a cumulative counter, if it is interested then it can read the counter value and look for a change in value. This works for this library because there's only going to be one application reading the data. PortIn would have to expose the counter value for Adafruit_CircuitPython_MIDI library to expose it. This also lets you know how bad the problem is.

For "non critical" applications they can stay as they are since there's no interface change and no unexpected exceptions. For applications which are interested they can consult the counter and compared with previous value if they wish to detect that input has been lost from buffer overruns.

It would be odd to have protocol-specific logic for a general usb device to skip data when there are buffer overruns. For a MIDI specific usb device this seems far more reasonable and the MIDI protocol is perhaps unusual in facilitating this with the high bit differentiating between status and data. It would certainly be more efficient to do it in the C code.

A few other points that might be of interest follow.

In an earlier version of Adafruit_CircuitPython_MIDI I had the messages without a channel and that was an external value. In that version reading data with receive() returned (msg, channel). I had some feedback on returning tuples in https://github.com/adafruit/Adafruit_CircuitPython_MIDI/pull/9#discussion_r271430279

It's important to consider that returning multiple values is triggering a second allocation for the tuple in addition to the note itself. I'd rather have note be a little larger than doing a second allocation.

In one of the Adafruit_CircuitPython_MIDI tests there's a test which checks how the library behaves with large amounts of random junk input. I did this mainly to prove it didn't go into any strange internal loops that would cause it to never return.

I think @tannewt was also working on an alternate implementation of a more efficient buffer in Adafruit_CircuitPython_MIDI (this is not the usb buffer in C code), this is mentioned in https://github.com/adafruit/Adafruit_CircuitPython_MIDI/pull/9#discussion_r271426602 - if the python code is more efficient that would help a little bit to reduce the likelihood of usb buffer overruns.

I tried to keep things (python code and memory use) as small as possible to make sure this library was viable on the M0 processors to let people with Circuit Playground Express boards use it.

I found playback of MIDI files were a good way to test MIDI volumes. Some have a lot of cc or crazy high-res pitch bending. You can see devices falling behind as things start buffering and then you can typically hear it when notes are lost. I played around with disabling parts of the track to see what would and wouldn't playback ok. https://www.youtube.com/watch?v=LFzJfrsVb8A has the Vol1 and Vol2 disabled as it couldn't handle the frequent volume modulation that goes on in the original.

chasburn commented 5 years ago

I would like to be able to handle syysex midi messages. The problem is that they maybe any length. They are normally used to save and load configurations. Usually the first two bytes after the status byte tell how long the message is. If the message is more than twice the length of the buffer you will get more than one overflow.

dhalbert commented 4 years ago

@theacodes Is this still on your radar for for 5.0.0 or should we move it to the 5.x.x or Long Term milestones? Thanks.

theacodes commented 4 years ago

I still want to do this, I've just been busy with Sol's manufacturing plan. Will loop back to it soon.

On Mon, Feb 3, 2020, 9:49 AM Dan Halbert notifications@github.com wrote:

@theacodes https://github.com/theacodes Is this still on your radar for for 5.0.0 or should we move it to the 5.x.x or Long Term milestones? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/2245?email_source=notifications&email_token=AAB5I42QXHOJEPPA4YNXVHTRBBKMBA5CNFSM4JFKCH4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKUYJAI#issuecomment-581534849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5I47SEV4VIK5IJ3CJ3TLRBBKMBANCNFSM4JFKCH4A .

tannewt commented 4 years ago

I've moved this to 5.x features since we can add it at any point.