AcademySoftwareFoundation / OpenTimelineIO

Open Source API and interchange format for editorial timeline information.
http://opentimeline.io
Apache License 2.0
1.41k stars 278 forks source link

MIDI Support (?) #829

Open zamaudio opened 3 years ago

zamaudio commented 3 years ago

Feature Request

Is there a way to include MIDI events currently in the spec?

Description

For this timeline specification to be more complete, and allow interoperability with Digital Audio Workstations (DAWs) such as Ardour, there should be a way to include MIDI events within the spec. This would enable formats that include such information to be preserved across session interchange.

I realise that OTIO has all media related content stored externally to the session, but MIDI could be a special case where you may wish to include all the events within the session file, as they can be considered as individual events that occur at particular times, (like markers), and do not take much space at all.

Perhaps there is already a way to include MIDI events as a miscellaneous data type (?)

meshula commented 3 years ago

I've played with this a little. MIDI is a wire-rendered version of a DAW file. Data size is a likely an issue. A MIDI file has got roughly a byte a note (actually a decode64 block), a marker per note would make the OTIO file comparatively immense. MIDI file has a bit of a cheat in it where it encodes non-note command individually (things like program changes), and then stores runs of notes as a single data blob. If a correspondence between MIDI and OTIO were to be engineered, perhaps it would be something like that; markers for things that are not notes, and some sort of sampled data block for note runs.

It might be reasonable to devise a MIDI adaptor for OTIO, accepting that the MIDI file would be comparatively enormous. The constraint would be that there would have to be a canonical mapping of MIDI events to OTIO data.

For reference, in case anyone is wondering what is encoded in a MIDI file, here's the meta events -

    enum class Midi_MetaEventType : uint8_t
    {
        SEQUENCE_NUMBER = 0x00,
        TEXT = 0x01,
        COPYRIGHT = 0x02,
        TRACK_NAME = 0x03,
        INSTRUMENT = 0x04,
        LYRIC = 0x05,
        MARKER = 0x06,
        CUE = 0x07,
        PATCH_NAME = 0x08,
        DEVICE_NAME = 0x09,
        MIDI_CHANNEL_PREFIX = 0x20,
        WHAT_is_THIS = 0x21,
        END_OF_TRACK = 0x2F,
        TEMPO_CHANGE = 0x51,
        SMPTE_OFFSET = 0x54,
        TIME_SIGNATURE = 0x58,
        KEY_SIGNATURE = 0x59,
        PROPRIETARY = 0x7F,
        SYSTEM_EXCLUSIVE = 0xF0,
        END_OF_EXCLUSIVE = 0xF7,
        CHANNEL_EVENT = 0xFE,
        UNKNOWN = 0xFF
    };

and the note events:


    enum class MessageType : uint8_t
    {
        INVALID             = 0x0,
        // Standard Message
        NOTE_OFF           = 0x80,
        NOTE_ON            = 0x90,
        POLY_PRESSURE      = 0xA0, // after touch
        CONTROL_CHANGE     = 0xB0,
        PROGRAM_CHANGE     = 0xC0,
        AFTERTOUCH         = 0xD0, // channel pressure
        PITCH_BEND         = 0xE0,

        // System Common Messages
        SYSTEM_EXCLUSIVE   = 0xF0,
        TIME_CODE          = 0xF1,
        SONG_POS_POINTER   = 0xF2,
        SONG_SELECT        = 0xF3,
        RESERVED1          = 0xF4,
        RESERVED2          = 0xF5,
        TUNE_REQUEST       = 0xF6,
        EOX                = 0xF7,

        // System Realtime Messages
        TIME_CLOCK         = 0xF8,
        RESERVED3          = 0xF9,
        START              = 0xFA,
        CONTINUE           = 0xFB,
        STOP               = 0xFC,
        RESERVED4          = 0xFD,
        ACTIVE_SENSING     = 0xFE,
        SYSTEM_RESET       = 0xFF
    };
pauldavisthefirst commented 3 years ago

This is not an entirely accurate view of the topic. MIDI defines both a wire format and a file format ("Standard MIDI File"). There is a lot of overlap between the contents, but they are not the same. Most programs that allow the user to create, manipulate and store MIDI will ultimately use SMF to do so (certainly if the data is to be visible to other applications). But SMF needs to be interpreted and converted before it represents the wire format. The meta-events listed above are part of the file format; they do not exist for the wire format. Also, the "running status" hack that is incompletely described above is optional and increasingly not used anymore as wire speeds for MIDI transmission have become much faster. It does not cover just notes, but any series of messages where the status byte remains the same. Finally, the wire format is more like 6 bytes per note, not 1 (reducable to 5 with running status).

meshula commented 3 years ago

Thanks for the clarification and corrections, @pauldavisthefirst, the fundamental split between SMF and wire-rendered had slipped my mind when I wrote that first sentence. The one byte per note I mentioned is indeed the SMF running status note, and one byte per is not enough, as every note must be followed by a velocity byte of course.

Coming back to the question at hand, my opinion is that a MIDI adaptor for OTIO is a reasonable idea, and the key to such an endeavor would be coming up with a canonical mapping of MIDI events with standardized names to OTIO data. The OTIO file will, as I mentioned, be very large in comparison with an SMF file, because, json.

I have some starter questions. This isn't exhaustive, just the issues I encountered in my own MIDI-OTIO prototype, and for which I didn't come up with answers that made me particularly happy. Paul, I notice that you work on Ardour; Ardour must already have strong answers for these, and you might have further points we would need to consider?

pauldavisthefirst commented 3 years ago

Timing is deeply problematic when viewed in the largest sense. I'm not going to get into it here, but I will say that Robin Gareus and myself had a long discussion online because we wanted to use OTIO's RationalTime model, but eventually realized that it cannot work because the mapping between musical time (as specified by MIDI) and OTIO's seconds-based time is defined by an external entity (typically called a Tempo Map), and this concept does not exist in OTIO. This mapping is monotonic, non-linear and also potentially stationary.

meshula commented 3 years ago

The wikipedia article https://en.wikipedia.org/wiki/Tempo_map has a nice example of a tempo map.

Yes, an OTIO file currently represents a linear time domain. We are currently working on the next revision of OTIO, which formalizes time coordinate systems, and transformations between them. A tempo map can be explicitly represented in that formulation.

We would welcome you to join the discussion on the ASWF slack, http://academysoftwarefdn.slack.com/, on the #opentimelineio channel if you are interested in becoming involved in the conversation about next steps.

Until temporal coordinate systems are available in the library and file format, the tempo map would seem to be a blocker on a useful mapping between MIDI and OTIO, unless the MIDI file is fully rendered, and has therefore lost an important part of what it represents.

pauldavisthefirst commented 3 years ago

I have spent many, many months working on the issues of representing musical and audio/wallclock time in the context of Ardour (https://ardour.org/). This work is still ongoing. It is incredibly challenging to do right and I regret I have no time or mental bandwidth to get involved in the discussion once more. I did leave a few notes on the fundamental problems here: https://ardour.org/timing.html

Currently, a development branch of Ardour uses a 62 bit "superclock" that ticks at 508032000Hz to represent audio/video/wallclock time, and the same data type to represent musical time. There is a flag bit to indicate that the value represents musical time.

The most challenging part of the problem concerns tempo ramps, which would typically end up using floating point in some way, and doing so naively (and perhaps doing so at all) ends up breaking things. We are considering using a Padé approximant as one approach to this, but currently the problem is still unsolved. However, this is not a problem when dealing with SMF data, because SMF does not provide any way to represent tempo ramps. You just have to insert discrete changes in tempo into the file.

meshula commented 3 years ago

Time computation is indeed complex and difficult to get right, one of the primary motivations of this project is to work through the issues, such that computations are transportable and reproducible between NLEs. I appreciate the notes you've left here for us, thanks!

jminor commented 3 years ago

@pauldavisthefirst thanks for writing this, https://ardour.org/timing.html which clearly explains the difficulty encountered with loss of precision when round-tripping data through a function that requires snapping. Is it fair to say that the "superclock" approach is simply to use such a high clock rate that the loss due to snapping/rounding is minimized?

I don't have a musical background, but it sounds like tempo maps, and tempo ramps are quite similar to time warp effects (e.g. speed up, slow down, and non-linear speed ramps) used in video effects. Could they be treated mathematically the same?

pauldavisthefirst commented 3 years ago

I'm not familiar with how those sorts of video effects wouldd be implemented, but I suspect they are similar.

Musical tempo ramps are (almost) always exponential: humans perceive that as a smooth speed up ... imagine speeding up over 4 bars of music: you increase by 25% in the first bar, then another 25% in the next bar etc. But because the changes are "stacking" on each other, this creates an exponential rather than linear ramp. It feels natural to use floating point for this when coding, but it leads to inaccurate results.

The high clock rate has two functions. One is as you mentioned, the other is based on it have many, many factors, so that divisions by most small-ish integers results in an integer. The number cited has factors at least to 16 IIRC.

pauldavisthefirst commented 3 years ago

I'm not familiar with how those sorts of video effects wouldd be implemented, but I suspect they are similar.

Musical tempo ramps are (almost) always exponential: humans perceive that as a smooth speed up ... imagine speeding up over 4 bars of music: you increase by 25% in the first bar, then another 25% in the next bar etc. But because the changes are "stacking" on each other, this creates an exponential rather than linear ramp. It feels natural to use floating point for this when coding, but it leads to inaccurate results.

The high clock rate has two functions. One is as you mentioned, the other is based on it have many, many factors, so that divisions by most small-ish integers results in an integer. The number cited has factors at least to 16 IIRC.