RustAudio / cpal

Cross-platform audio I/O library in pure Rust
Apache License 2.0
2.64k stars 348 forks source link

Proposal for minimal timestamp API to allow for synchronising media with CPAL streams #363

Closed mitchmindtree closed 4 years ago

mitchmindtree commented 4 years ago

This is a proposal to begin addressing #279 with the most minimal API necessary.

Background

The most seemingly accurate and thorough research I could come across on this topic is Ross Bencina's excellent paper PortAudio and Media Synchronisation - It's All in the Timing. It contains an overview of the media synchronisation problem with example scenarios, visual diagrams, etc that make it more intuitive.

http://www.portaudio.com/docs/portaudio_sync_acmc2003.pdf

The first few sections of the paper describe some hypothetical scenarios and different techniques for synchronising audio with some other kind of media. A MIDI clock is the primary example used in the paper, but the same techniques apply to presenting frames of graphics and other forms of media sync.

Section 6 describes the minimal set of information necessary in order to make these synchronisation techniques possible:

PortAudio decided to provide this monotonic time in seconds using a double-precision floating-point data type:

The double data type was chosen after considerable deliberation because it provides sufficient resolution to represent time with high-precision, may be manipulated using numerical operators, and is a standard part of the C and C++ languages.

Section 7 also describes implementation issues. They can be roughly summed up as follows:

Proposal

I propose that we add the following:

I've been doing some research into the way that timing information is provided by each of the different hosts supported by CPAL. I'll add a follow-up comment soon with the relevant info for some more context for those interested and for myself to refer back to during implementation.

The transport API discussed within #279 has been intentionally omitted in the hope that it can be implemented on top of the proposed timestamp API. In the case that it cannot, this is likely best left to be addressed in a future PR either way.

mitchmindtree commented 4 years ago

CPAL Timing API Research

ASIO

Timing information is updated via the bufferSwitch() callback which is called by the ASIO driver implementation:

From the ASIO SDK 2.3 docs:

In order to provide proper media synchronization information to the host application a driver should fetch, at the occurrence of the bufferSwitch() or bufferSwitchTimeInfo() callback invocation event (interrupt or timed event), the current system time and sample position of the first sample of the audio buffer, which will be past to the callback. The host application retrieves this information during the bufferSwitch() callback with ASIOGetSamplePosition() or in the case of the bufferSwitchTimeInfo() callback this information is part or the parameters to the callback.

The following example is provided for a stream with a buffer size of 1024 samples, sample rate of 44100 Hz and a SystemTime start of 2000 ms:

Callback No: 0 1 2 3 4 5 6
BufferIndex: 0 1 0 1 0 1 0
SystemTime(ms): 2000 2000 2023 2046 2069 2092 2116
SamplePosition: 0 1024 2048 3072 4096 5120 6144

The docs also note that initially, the callback will be called multiple times at the same system time in order to prepare the buffers. This can also be seen in the table above. This is another interesting motivation for ensuring that applications consider the callback's provided system time for synchronisation rather than simply trying to count samples.

CoreAudio

Provides a AudioTimeStamp as an argument to the stream data callback.

https://developer.apple.com/documentation/coreaudiotypes/audiotimestamp

First, the AudioTimeStampFlags must be checked to determine which of the contained timestamp representations are actually valid. The following members seem most relevant to us:

ALSA

https://www.kernel.org/doc/html/latest/sound/designs/timestamping.html

The ALSA API can provide two different system timestamps:

Also provides the following:

These are provided along with a snapshot of system time. Options for snapshot are:

We definitely want one of the MONOTONIC options for CPAL, but it's unclear to me whether or not we want NTP corrections. It would be nice to clarify what kind of corrections are applied, e.g. are the corrections a subtle skewing of the rate? Can it jump forwards in time by large steps? Until we can answer these questions, I'm intuitively inclined to use the raw timestamp for potentially more consistent clock behaviour.

An audio_tstamp is also provided containing the timing of the different stages. Useful diagram:

--------------------------------------------------------------> time
  ^               ^              ^                ^           ^
  |               |              |                |           |
 analog         link            dma              app       FullBuffer
 time           time           time              time        time
  |               |              |                |           |
  |< codec delay >|<--hw delay-->|<queued samples>|<---avail->|
  |<----------------- delay---------------------->|           |
                                 |<----ring buffer length---->|

Other useful links:

ishitatsuyuki commented 4 years ago

For audio CLOCK_MONOTONIC_RAW is definitely preferred since audio clock shouldn't be skewed. The only concerns are portability (Linux 2.6+ only) and that it can't be used for timers (sleep is fine). Although I vaguely remember ALSA doesn't exist on BSD so we can probably go with CLOCK_MONOTONIC_RAW.

mitchmindtree commented 4 years ago

I have opened a WIP for this that can be tracked at #397.

mitchmindtree commented 4 years ago

Closed via #397.