mezzanine content for audio-only testing

jpiesing commented 3 years ago

See the script at https://github.com/cta-wave/mezzanine/blob/master/audiomezz.py and the wiki at https://github.com/cta-wave/mezzanine/wiki/Audio-Mezzanine

yanj-github commented 3 years ago

What "seed=?" did you used to generate the following please? Can I have the number please? https://dash.akamaized.net/WAVE/Mezzanine/new/mezzanine%5bfull-scale_white_noise_bandlimited_7kHz%5d_48kHz_16bit_60s.wav

nicholas-fr commented 3 years ago

@yanj-github That older audio file was created with an earlier version of the script, before I implemented the use of a defined PRNG seed.

I have uploaded a new file that was created with seed = "test" (116101115116): audiomezz.py -s test "mezzanine[full-scale_white_noise_bandlimited_7kHz]_48kHz_16bit_2ch_60s_test.wav"

https://dash.akamaized.net/WAVE/Mezzanine/new/mezzanine%5bfull-scale_white_noise_bandlimited_7kHz%5d_48kHz_16bit_2ch_60s_test.wav

yanj-github commented 3 years ago

The proposed white noise mezzanine for audio only test (CTA-5003 8) is suitable for the purpose.

We think this can be used in video and audio synchronisation check (CTA-5003 9) without needing additional annotation "flashed and beeps", on the assumption a recording device can accurately synchronise the video and audio or if agreed tolerances can account for a recording device limitation.

Regarding to the audio only testing (CTA-5003 8): O1: Every sample S[k,s] shall be rendered and the samples shall be rendered in increasing presentation time order. O2: The playback duration of the playback matches the duration of the CMAF Track, i.e. TR [k, S] = TR [k, 1] + td[k]. O3: The start-up delay should be sufficiently low, i.e., TR [k, 1] – Ti < TSMax. O4: The presented sample matches the one reported by the currentTime value within the tolerance of the sample duration.

The proposed white noise mezzanine can be used for O1 and O2. Checks to be done every N number of PCM sample. The best value for N is to be discussed and agreed with DPCTF group later when it comes to the stage of implementation.

In terms of O3 and O4, observations are possible if the playback has a video and audio contents that are played jointly. If the video/audio is well synchronised, we can get information such as current time and start up time from the Test Runner QR code from video and compare that with the audio signal.

Regarding to Video/Audio synchronisation check (CTA-5003 9): We think it is possible to use white noise for the Video/Audio synchronisation observation. It is achievable by calculate time from a QR code on the screen and check if the expected audio signal is there at the same time on the recording.

jpiesing commented 3 years ago

We think this can be used in video and audio synchronisation check (CTA-5003 9) without needing additional annotation "flashed and beeps", on the assumption a recording device can accurately synchronise the video and audio or if agreed tolerances can account for a recording device limitation.

I seem to remember @nicholas-fr saying that the white noise was unreasonable to play out of speakers. It would only be sensible if the microphone input to a camera could be plugged into a speaker or headphone output on a TV or media device.

pshorrock commented 3 years ago

@jpiesing are you raising that as a blocker or a nice to have? We believe the content is testable but would concede from our real world experience of audio capture to date that a fallback of microphone capture may be required in some instances. If that ended up as a widely held view (and I'm not trying to say it is the final word on this - I'm sure many others more experienced than me can add further comment) would it in your opinion render the content not usable? I'm just trying to understand how much of a blocker this is.

jpiesing commented 3 years ago

@jpiesing are you raising that as a blocker or a nice to have? We believe the content is testable but would concede from our real world experience of audio capture to date that a fallback of microphone capture may be required in some instances. If that ended up as a widely held view (and I'm not trying to say it is the final word on this - I'm sure many others more experienced than me can add further comment) would it in your opinion render the content not usable? I'm just trying to understand how much of a blocker this is.

@pshorrock What are the advantages & disadvantages of using each approach? How many mobile phones & tablets have a socket for audio output vs how many are bluetooth only?

jpiesing commented 3 years ago

I've opened a DPCTF issue to discuss the issue of assumptions about wired connections between test device & camera. https://github.com/cta-wave/device-playback-task-force/issues/89

pshorrock commented 3 years ago

@jpiesing to try edge us towards an answer to the former (for the latter I have posted a comment in https://github.com/cta-wave/device-playback-task-force/issues/89) here is a table we initially made to asses both approaches and I'm adding it here to help try to move the conversation forward (but very much welcome comments / thoughts). It should be noted that for beeps in audio we have not tried to capture and separate out any beeps/tones that overlay each other so though in theory you could just keep adding to the audio track to tick off all observations we are not sure how well that might work in the real world when trying to test it (hence some comments being cautious and just focusing on beeps, flashes and start/end jingles). Elsewhere, we have also so far been reliant on our AVDPU device to capture audio (based on the BBC design for sync testing), whereas for the white noise this is relying on only the camera which is an advantage (fewer testing components) but does rely on it synchronising the AV capture correctly (or to put it another way, within an acceptable tolerance):

| White Noise | Flash & Beep -- | -- | -- Every sample S[k,s] shall be rendered and the samples shall be rendered in increasing presentation time order. | Checks to be done every N number of PCM sample. | Checks can be done by detecting flashes and beeps are in sync plus every video frames are rendered in order. The playback duration of the playback matches the duration of the CMAF Track, i.e. TR [k, S] = TR [k, 1] + td[k]. | Video Audio need to correctly in sync, a camera to be plugged into a speaker or headphone output on a TV or media device. | Starting jingle need to be added to annotation. The start-up delay should be sufficiently low, i.e., TR [k, 1] – Ti < TSMax. | Video Audio need to correctly in sync, a camera to be plugged into a speaker or headphone output on a TV or media device. | Starting and ending jingle need to be added to annotation. The presented sample matches the one reported by the currentTime value within the tolerance of the sample duration. | Video Audio need to correctly in sync, a camera to be plugged into a speaker or headphone output on a TV or media device. | We don’t think it is possible to observe this using flash & beep approach. The presentation starts with the earliest video sample and the audio sample that corresponds to the same presentation time. | Video Audio need to correctly in sync, a camera to be plugged into a speaker or headphone output on a TV or media device. | We don’t think it is possible to observe this using flash & beep approach. While continuing playback, the media samples of different tracks with the same presentation times are presented jointly. | Video Audio need to correctly in sync, a camera to be plugged into a speaker or headphone output on a TV or media device. | OK Every sample for every media type included in the CMAF Presentation duration shall be rendered and shall be rendered in order. | Video Audio need to correctly in sync, a camera to be plugged into a speaker or headphone output on a TV or media device. | Checks can be done by detecting flashes and beeps are in sync plus every video frames are rendered in order.

nicholas-fr commented 3 years ago

Here are a couple of points to consider, in addition to the useful overview above:

The beeps (+flashes) have the added benefit of being manually observable as well as automatable.
Starting/ending 1kHz sine tones are present in the mezzanine together with the green/red start/end frames.

jpiesing commented 3 years ago

July 6th meeting Agree to go ahead as proposed unless we find 1) it won't work or 2) we find something better.

cta-wave / mezzanine

mezzanine content for audio-only testing #34