Closed mitchmindtree closed 4 years ago
Timing information is updated via the bufferSwitch()
callback which is called
by the ASIO driver implementation:
SystemTime
describes the system time associated with the first sample of the callback. On Windows (the only OS on which ASIO is supported by CPAL), ASIO apparently retrieves this via the multimedia timer, timeGetTime()
, which only provides a resolution of 1 ms.SamplePosition
seems to describe the sample position of the first sample of the callback since the stream began.From the ASIO SDK 2.3 docs:
In order to provide proper media synchronization information to the host application a driver should fetch, at the occurrence of the bufferSwitch() or bufferSwitchTimeInfo() callback invocation event (interrupt or timed event), the current system time and sample position of the first sample of the audio buffer, which will be past to the callback. The host application retrieves this information during the bufferSwitch() callback with ASIOGetSamplePosition() or in the case of the bufferSwitchTimeInfo() callback this information is part or the parameters to the callback.
The following example is provided for a stream with a buffer size of 1024
samples, sample rate of 44100 Hz and a SystemTime
start of 2000 ms:
Callback No: | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
BufferIndex: | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
SystemTime(ms): | 2000 | 2000 | 2023 | 2046 | 2069 | 2092 | 2116 |
SamplePosition: | 0 | 1024 | 2048 | 3072 | 4096 | 5120 | 6144 |
The docs also note that initially, the callback will be called multiple times at the same system time in order to prepare the buffers. This can also be seen in the table above. This is another interesting motivation for ensuring that applications consider the callback's provided system time for synchronisation rather than simply trying to count samples.
Provides a AudioTimeStamp
as an argument to the stream data callback.
https://developer.apple.com/documentation/coreaudiotypes/audiotimestamp
First, the AudioTimeStampFlags
must be checked to determine which of the
contained timestamp representations are actually valid. The following members
seem most relevant to us:
mHostTime
: "The host machine's time base (see CoreAudio/HostTime.h)." I
could find no further docs, but found this SO answer:
https://stackoverflow.com/questions/675626/coreaudio-audiotimestamp-mhosttime-clock-frequency
From my understanding, this is retrieved via mach_absolute_time()
which
represents a number of ticks since startup. To convert from this ticks value
to nanoseconds, the mach_timebase_info
must be used. There's an old example here:
https://shiftedbits.org/2008/10/01/mach_absolute_time-on-the-iphone/mSampleTime: f64
: The absolute sample frame time.mRateScalar
: "The ratio of actual host ticks per sample frame to the
nominal host ticks per sample frame."mWordClockTime: u64
: The docs don't give any explanation, but according to
some comment on HN this is a sample counter that "ticks" up each sample.https://www.kernel.org/doc/html/latest/sound/designs/timestamping.html
The ALSA API can provide two different system timestamps:
Also provides the following:
avail
how much data can be written in the ring bufferdelay
the time it will take to hear a new sample after all queued samples have been played out. This could be useful for acquiring the "playback" instant.These are provided along with a snapshot of system time. Options for snapshot are:
We definitely want one of the MONOTONIC options for CPAL, but it's unclear to me whether or not we want NTP corrections. It would be nice to clarify what kind of corrections are applied, e.g. are the corrections a subtle skewing of the rate? Can it jump forwards in time by large steps? Until we can answer these questions, I'm intuitively inclined to use the raw timestamp for potentially more consistent clock behaviour.
An audio_tstamp
is also provided containing the timing of the different stages. Useful diagram:
--------------------------------------------------------------> time
^ ^ ^ ^ ^
| | | | |
analog link dma app FullBuffer
time time time time time
| | | | |
|< codec delay >|<--hw delay-->|<queued samples>|<---avail->|
|<----------------- delay---------------------->| |
|<----ring buffer length---->|
Other useful links:
For audio CLOCK_MONOTONIC_RAW
is definitely preferred since audio clock shouldn't be skewed. The only concerns are portability (Linux 2.6+ only) and that it can't be used for timers (sleep is fine). Although I vaguely remember ALSA doesn't exist on BSD so we can probably go with CLOCK_MONOTONIC_RAW
.
I have opened a WIP for this that can be tracked at #397.
Closed via #397.
This is a proposal to begin addressing #279 with the most minimal API necessary.
Background
The most seemingly accurate and thorough research I could come across on this topic is Ross Bencina's excellent paper PortAudio and Media Synchronisation - It's All in the Timing. It contains an overview of the media synchronisation problem with example scenarios, visual diagrams, etc that make it more intuitive.
http://www.portaudio.com/docs/portaudio_sync_acmc2003.pdf
The first few sections of the paper describe some hypothetical scenarios and different techniques for synchronising audio with some other kind of media. A MIDI clock is the primary example used in the paper, but the same techniques apply to presenting frames of graphics and other forms of media sync.
Section 6 describes the minimal set of information necessary in order to make these synchronisation techniques possible:
GetStreamTime(Stream* s)
function).PortAudio decided to provide this monotonic time in seconds using a double-precision floating-point data type:
Section 7 also describes implementation issues. They can be roughly summed up as follows:
GetStreamTime
).Proposal
I propose that we add the following:
StreamInstant
struct representing a monotonic time instance retrieved from either 1. the stream's underlying audio data callback or 2. the same time source used to generate time stamps for a stream's underlying audio data callback. No guarantees are made about the duration that the value represents, only that it is monotonic and begins either before or equal to the moment the stream was started. Internally we could represent the instant in a similar manner tostd::time::Duration
, providing methods for easy access to more accessible representations e.g..as_secs_f64()
, etc.InputStreamTimestamp
OutputStreamTimestamp
Both structs contain two fields of typeStreamInstant
:callback
indicating the instant at which the data callback was called.buffer_adc
andbuffer_dac
representing the instance of capture and playback from the audio device for the input and output streams respectively. An instance of these structs would be provided to the respective user's data callback.fn now(&self) -> StreamInstant
method for theStream
handle type, allowing users to produce an instant in time via the same source used to generate timestamps for the data callback, useful for media sync. It will be important to document exactly what system API is used for each host and to list any notable limitations (e.g. the 1ms best-case resolution on ASIO).I've been doing some research into the way that timing information is provided by each of the different hosts supported by CPAL. I'll add a follow-up comment soon with the relevant info for some more context for those interested and for myself to refer back to during implementation.
The transport API discussed within #279 has been intentionally omitted in the hope that it can be implemented on top of the proposed timestamp API. In the case that it cannot, this is likely best left to be addressed in a future PR either way.