Closed bobcampbell-resillion closed 1 year ago
I'm not an audio expert but here are some of the possibilities that occurred to me.
The use of 0.2s is purely arbitrary in these examples. It could be longer or shorter depending on how long a sample would be needed for analysis tools to determine the frequency.
In the automated (or semi-automated) case, I have no idea of what open source tool might be able to analyse an audio file and provide information on the frequency and how that varies over time within the audio file. It may be easier to find a tool that works with step changes in the frequency than with smoothly increasing or decreasing frequencies.
For this sort of audio testing, which is proper presentation of the streams, the key element to test is usually audio sync. This is standard for testing, video test patterns should always include an audio sync element. Where human testing is involved, dialogue is also a useful addition, as the human ear is very adept at detecting out-if -sync dialogue, though it doesn’t replace good sync mark testing.
Next in line would be proper reproduction of the channel configuration. Test patterns we often ship include a visual representation of a channel-isolated sound to sync with the audio (i.e. “left channel, left channel”).
I don’t see much value in testing the audio quality itself (for noise, jitter, whatever) as you would be testing things that are out-of-scope such as the actual decoders or compression algorithms. However, it would always be welcome to have audio tone (typically 1kHz) as part of a test pattern – this is always handy for audio chain calibration, and could even be used in automated testing to make sure there are no changes in playback speed or wild audio degradations.
Richard
From: Jon Piesing notifications@github.com Sent: Monday, April 6, 2020 12:50 PM To: cta-wave/device-playback-task-force device-playback-task-force@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [cta-wave/device-playback-task-force] Audio experts: what are suitable audio features needed in mezzanine content to enable required observations (manual or automated) for audio streams? (#74)
I'm not an audio expert but here are some of the possibilities that occurred to me.
The use of 0.2s is purely arbitrary in these examples. It could be longer or shorter depending on how long a sample would be needed for analysis tools to determine the frequency.
In the automated (or semi-automated) case, I have no idea of what open source tool might be able to analyse an audio file and provide information on the frequency and how that varies over time within the audio file. It may be easier to find a tool that works with step changes in the frequency than with smoothly increasing or decreasing frequencies.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cta-2Dwave_device-2Dplayback-2Dtask-2Dforce_issues_74-23issuecomment-2D610001702&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=eIF5Xry7L4bwoE7wCr2GE-F8ZIaGmGaqlg-enmujDio&m=_YFCfpFx5EVVeTXZK65-0Ri40UYMV6PLsUCh0bo6Zk4&s=VkilN_R81D0nyVevaNeWrPCNaiTpLbvpdnK2QMFyBmE&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AE64FY5URJDNQBYINM5WFKDRLIXAHANCNFSM4MCJGWHQ&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=eIF5Xry7L4bwoE7wCr2GE-F8ZIaGmGaqlg-enmujDio&m=_YFCfpFx5EVVeTXZK65-0Ri40UYMV6PLsUCh0bo6Zk4&s=X6UMOJT5lqHJZ8fQIJw602hejnkhlwSYQ_-9unBtBMs&e=.
Thanks both. Assuming the definition of "jointly" discussed in #64 can be resolved, I agree one should add some synchronised flashes and beeps overlaid on the underlying content (if no sync mark exists already), which work ok for manual and automated observations. But, in context of a parallel conversation about alternative “open source” mezzanine content, sounds like Tears of Steel would be better than Big Buck Bunny, due to the live action segments where lip sync might be more obvious.
I think we've lost the original issue here. Features to include in audio to test audio/video sync are one thing. They are somewhat understood and the current version of the mezzanine content script includes flashes and beeps based on the work from the BBC.
The current "single track media playback" requirements include a variation on this as a "required observation" for all media formats;
Every sample S[k,s] shall be rendered and the samples shall be rendered in increasing presentation time order.
For video, every sample means every frame and the mezzaine content script adds a distinct QR code, a timecode and a frame number to every video frame.
Is there is something similar for audio?
If not then perhaps this requirement should be moved from under "required observations" / "general" to being under "video".
FWIW I don't think its practical or useful to verify this requirement in audio:
Every sample S[k,s] shall be rendered and the samples shall be rendered in increasing presentation time order.
...but if someone thinks up a means to do so then great. If its moved so it doesn't apply to audio, I suggest replacing with something softer that relates to "the audio plays". Otherwise a WAVE device could fail to play certain audio "properly" as not sure under what other requirement that would be verified.
One sentence needs to be moved to only apply for video and not for both. Otherwise still input wanted for audio.
Following today's DPCTF call, here is a summary of what the mezzanine content currently includes:
DPCTF 2022/01/26 this has been covered by quite some efforts on creating the mezzanine content. Please check here for detailed discussion: https://github.com/cta-wave/mezzanine.
A code is proposed: https://github.com/cta-source/audio-watermark-study
Can we use this as mezzanine content and document it in the specification in the Annex. The Test TF will address whether this approach is agreeable. Once completed, we will address documentation in specification.
@cta-source please correct or add additional information @jpiesing please discuss this in Test TF
I suggest we close this issue.
Closed per recommendation by @cta-source following discussion on DPCTF call held March 8, 2023.
In most of the spec sections a set of generic observation requirements are stated:
(or something similar adjusted for e.g. random access cases, see also #62 ) and also:
Note also under 9.2.5.1, 9.3.5.1 and 9.4.5.1 there is an implied AV sync requirement:
(definition of "jointly" under discussion in issue #64)
For video, some annotations are possible on each frame that would be both human readable, and facilitate later automation.
For audio, some observations seem challenging to achieve: do any audio experts have suggestions or even examples of suitable streams that would be a good basis for a mezzanine audio track?
Note that there may or may not be background audio from the source mezzanine video, it is assumed that is unlikely to be useful in determining the above requirements...
The spec may also benefit from articulating more explicitly human verifiable means of confirming the required observations in the case of audio media...