cta-wave / mezzanine

This repo contains scripts that will build annotated test content from specific source content, compatible with the WAVE device playback test suite.
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

Planning for long duration testing with audio watermarking #46

Closed cta-source closed 1 year ago

cta-source commented 2 years ago

The current audio watermarking proposal has a 60s duration "base" pseudo-random sequence. The test content spec (currently being integrated into the DPC spec as an Annex) defines some stereo WAV files, e.g., PN01.wav, with the prescribed noise in the Lch and silence in the Rch. The code structure permits extracting a mediaTime from a 20mS segment of recorded audio.

For long duration playback testing, if we loop the 60s PN sequence, mediaTime will effectively become modulo(60s), so an actual time of 10m12s will be detected as 12 seconds.

I think this is adequate for our purposes, since the OF can be monitoring time throughout. An error of skipping an integer number of minutes could happen, in theory, but it would need to be accurate to the 20mS level. I'm not sure this is worth the effort to plan for e.g. 2 hour long sequences. Plus there are some implementation issues--we do NOT want to process 2 hours of PN sequence in the same manner as we process 60s.

So--is this OK for long duration playback, that when we compare the recorded time of X min Y seconds, we only look at the Y?

rbouqueau commented 2 years ago

So--is this OK for long duration playback, that when we compare the recorded time of X min Y seconds, we only look at the Y?

The answer to this question is also needed to generate the test content.

jpiesing commented 2 years ago

So--is this OK for long duration playback, that when we compare the recorded time of X min Y seconds, we only look at the Y?

@cta-source Who is in a position to answer this question?

cta-source commented 2 years ago

I can answer, since it's my proposed approach. @rbouqueau , I'm going to over-complicate this in case I don't have your intended question right.

Critically, for generating the test content--it would be much better if the 'looping' consistently puts exactly the first sample of the next block of PN01 file data immediately after the final sample of the prior block of PN01 file data. That is, if PN01 is the array of samples of PN01,

T=00: PN01.wav[0], PN01.wav[0], ... PN01.wav[47999] T=60: PN01.wav[0], PN01.wav[0], ... PN01.wav[47999] T=120: PN01.wav[0], PN01.wav[0], ... PN01.wav[47999] ... T=(1h59m00s): PN01.wav[0], PN01.wav[0], ... PN01.wav[47999]

...with no extra sample or missing sample between 47999 of one block, and 0 of the next.

If this not feasible, please let me know. If you can generate a test file ready, I can check it.

rbouqueau commented 2 years ago

Understood. However who can confirm that only checking Y in {X,Y} is ok for validation?

jpiesing commented 2 years ago

Understood. However who can confirm that only checking Y in {X,Y} is ok for validation?

I don't fully understand the PNR approach but it seems to me that it would catch the following;

If checking Y in {X,Y} would catch the problems we can identify and there's no better solution then go for it.

cta-source commented 2 years ago

Sorry, I wasn't clear. I originally proposed checking only Y in {X, Y} in February. Since then, after more discussion, I don't think that's right. See my 7/22 comment in this issue. To summarize,

So: Checking only Y is not OK for validation. Checking X,Y for length of playback makes sense if that is the observation requirement. Using white noise segment validation makes sense for stricter observation requirements.

If it would help to jump on a call, I can do that.

jpiesing commented 2 years ago

I'm still lost about what we're discussing.

I think the question we're trying to answer is "what audio to use in the long duration playback stream"? Am I correct?

If so, then I see 4 choices.

Is there a 5th choice?

I was expecting to go with option 1 ...

cta-source commented 2 years ago

@jpiesing and @rbouqueau ; I agree with the 4 choices except that "nothing" is a bit of a null choice. I don't see a 5th choice. Commenting on the remaining 3:

If we pick one "winner" of these three options, we are betting a bit on what we can or cannot do.

(Re "enough" environmental noise: In one mic-to-speaker white noise test, I put the mic next to a TV turned up to a reasonable listening volume, and put the speaker a couple of meters away. No problem.)

If the above three are the options, I recommend we attempt "mixed". Based on my own testing, the white noise should be resolved properly. We'll know when we try the Croatia annotated audio, but my tests were promising.

Could I suggest an intermediate step? Generate two cases, "Looping" and "Mixed" (#1 and #3 above), but for shorter times, like 10 minutes. I can test them and find out things like, does a beep screw up the sync pattern on the receiver, does the noise come through on mixed, is the looping accurate (no dropped/added periods) at the sample level?

@rbouqueau , for mixing, the white noise file should be mixed at -13 dB below the "main" audio file peak signal (and the peak may be the beep?). That is, if the main audio peaks at one level, the white noise should be attenuated by 13 dB before mixing with the main.

rbouqueau commented 1 year ago

I see that some audio is included in https://dash.akamaized.net/WAVE/Mezzanine/releases/3/tos_LD1_1920x1080@30_7200.mp4 . Does this mean that this issue is implemented? CC @nicholas-fr

nicholas-fr commented 1 year ago

Audio in LD content was replaced with looped PN01 in mezzanine release v4, resolving this issue.

Further work to determine if we can use combined PN + source audio is underway, and a separate issue (#55)was raised to track that.