new video observation: frame is shown for expected time based on frame rate

jpiesing commented 11 months ago

The existing observations do not explicitly detect video frames being shown for longer than the expected time based on the frame rate. It does appear implicitly as a failure of the duration observation but it's not easy to work out what's going on.

Proposal.

Add a new observation to 8.2.5.2 as follows; 4) For fixed frame rate content, each frame is shown for the appropriate duration for that frame rate.

Add a new observation to 8.2.5.3 as follows; 3) Each 20ms audio sample is presented for 20ms and without a gap between each sample and the previous sample.

mbergman42 commented 10 months ago

Yan, does the new audio automation software check that last requirement, #3 in the list above?

3) Each 20ms audio sample is presented for 20ms and without a gap between each sample and the previous sample.

haudiobe commented 6 months ago

2024/02/28:

There was a requirement for locking the presentation time against the initial frame. This seems to be no longer present. We should re-install this requirements. Check in initial release.
Testing may done to compare the timing in the QR Code with the actual display time

@haudiobe to add some details in the observations

yanj-github commented 6 months ago

Yan, does the new audio automation software check that last requirement, #3 in the list above?

Each 20ms audio sample is presented for 20ms and without a gap between each sample and the previous sample.

I think this new observation just applicable to video only. Audio is not possible to play for longer duration as expected. We are comparing sequence of white noise if audio drops out and having silence, it would just fail existing observation.

haudiobe commented 6 months ago

A proposed update is here: https://standards.cta.tech/wg/dpctf/document/33886?downloadRevision=36233 -

jpiesing commented 6 months ago

Here is an example.

A frame in content encoded at 25Hz should be visible for 40ms. If the presentation of the content has been recorded at 120Hz so each frame in the recording corresponds to 8.3ms. 40/8.3 is 4.8 so each frame in the content should be visible in 5 or more likely 6 frames of the recording.

I lean towards thinking that the tolerance should be expressed in frames of the recording rather than frames of the encoded content. 6 frames of the recording with a tolerance of perhaps 1 or 2 frames of the recording would seem reasonable to me.

The proposal in the word file is a tolerance of 2 frames of the encoded content which is much, much larger than 2 frames of the recording. I would prefer plus or minus 1 frame of the encoded content or 2 frames of the recording.

yanj-github commented 6 months ago

I agree with Jon. For video I suggest: a) TR[k,1] + T[k,s] - T[k,1] - TR[k,s] = +- 2/recording framerate. b) TR[k,s] - TR[k,s-1] - 1/framerate = +- 2/recording framerate.

However, I have following concerns: For video, are we happy to skip this checks for 1st frame and last frame (they normally displayed for longer than a frame duration)? And also the start of the presentation will be calculated based on frame number 2 (TR[k,1] - one frame duration) or calculated based on 1st detected frame (TR[k,n] - one frame duration * (n-1)) if starting frames missing. For observation a) when start of the presentation not correctly rendered it would fail all samples.

jpiesing commented 5 months ago

I agree with Jon. For video I suggest: a) TR[k,1] + T[k,s] - T[k,1] - TR[k,s] = +- 2/recording framerate. b) TR[k,s] - TR[k,s-1] - 1/framerate = +- 2/recording framerate.

However, I have following concerns: For video, are we happy to skip this checks for 1st frame and last frame (they normally displayed for longer than a frame duration)?

Yes

And also the start of the presentation will be calculated based on frame number 2 (TR[k,1] - one frame duration) or calculated based on 1st detected frame (TR[k,n] - one frame duration * (n-1)) if starting frames missing.

Why is this calculation needed? The OF already knows the first and last frames of the recording where a given frame of the encoded content was detected. Just subtract the last frame number from the first frame number and allow for the tolerance.

For observation a) when start of the presentation not correctly rendered it would fail all samples.

Don't understand.

ZmGorynych commented 5 months ago

Remember that (a) frame duration can be modified, e.g. due to pulldown, and (b) there is variable frame rate content, such as gaming.

Both (a) and (b) are described in the bitstream, but not on systems level.

On Fri, Apr 12, 2024, 07:25 Jon Piesing @.***> wrote:

I agree with Jon. For video I suggest: a) TR[k,1] + T[k,s] - T[k,1] - TR[k,s] = +- 2/recording framerate. b) TR[k,s] - TR[k,s-1] - 1/framerate = +- 2/recording framerate.

However, I have following concerns: For video, are we happy to skip this checks for 1st frame and last frame (they normally displayed for longer than a frame duration)?

Yes

And also the start of the presentation will be calculated based on frame number 2 (TR[k,1] - one frame duration) or calculated based on 1st detected frame (TR[k,n] - one frame duration * (n-1)) if starting frames missing.

Why is this calculation needed? The OF already knows the first and last frames of the recording where a given frame of the encoded content was detected. Just subtract the last frame number from the first frame number and allow for the tolerance.

For observation a) when start of the presentation not correctly rendered it would fail all samples.

Don't understand.

— Reply to this email directly, view it on GitHub https://github.com/cta-wave/device-playback-task-force/issues/119#issuecomment-2051759862, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGEYZO7VX7HRTHZWPFVESDY47OGPAVCNFSM6AAAAAA5YUGEKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRG42TSOBWGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yanj-github commented 5 months ago

And also the start of the presentation will be calculated based on frame number 2 (TR[k,1] - one frame duration) or calculated based on 1st detected frame (TR[k,n] - one frame duration * (n-1)) if starting frames missing.

Why is this calculation needed? The OF already knows the first and last frames of the recording where a given frame of the encoded content was detected. Just subtract the last frame number from the first frame number and allow for the tolerance.

For observation a) when start of the presentation not correctly rendered it would fail all samples.

Don't understand.

Thanks @jpiesing it was a question of how we define and measure "the start of the presentation" TR[k,1]. For example with a test start with frame number 1: OF will see the 1st detection of the frame number 1 and the last detection of frame number 1. On some devices it holds the 1st frame longer than the frame duration. TR[k,1] = (last detection of TR[k,1] - one frame duration) When starting frames missing: TR[k,1] = (first detection of TR[k,n] - one frame duration * (n-1))

jpiesing commented 5 months ago

And also the start of the presentation will be calculated based on frame number 2 (TR[k,1] - one frame duration) or calculated based on 1st detected frame (TR[k,n] - one frame duration * (n-1)) if starting frames missing. Why is this calculation needed? The OF already knows the first and last frames of the recording where a given frame of the encoded content was detected. Just subtract the last frame number from the first frame number and allow for the tolerance. For observation a) when start of the presentation not correctly rendered it would fail all samples. Don't understand.

Thanks @jpiesing it was a question of how we define and measure "the start of the presentation" TR[k,1]. For example with a test start with frame number 1: OF will see the 1st detection of the frame number 1 and the last detection of frame number 1. On some devices it holds the 1st frame longer than the frame duration. TR[k,1] = (last detection of TR[k,1] - one frame duration) When starting frames missing: TR[k,1] = (first detection of TR[k,n] - one frame duration * (n-1))

I know all this & I'm still confused. We can choose to not apply this test to the first frame that is detected regardless of whether the first frame detected is frame 1, 2, or 275.

yanj-github commented 5 months ago

I know all this & I'm still confused. We can choose to not apply this test to the first frame that is detected regardless of whether the first frame detected is frame 1, 2, or 275.

We can not apply this observation for first frame, however, for every other frames "the start of the presentation" TR[k,1] is used. And which is the detection time of frame number 1. Each sample sample[k,s] with s=1, …, S should be rendered at its nominal presentation time T[k,s] relative to the start of the presentation. a) TR[k,1] + T[k,s] - T[k,1] - TR[k,s] = +- 2/recording framerate. - what start of the presentation TR[k,1] should be? b) TR[k,s] - TR[k,s-1] - 1/framerate = +- 2/recording framerate. - this is fine.

From OF we can calculate TR[k,1] based on TR[k,2].

jpiesing commented 4 months ago

Reviewing this discussion, we might end up with a solution to add the following to 8.2.5.2 (only).

5) Excluding the starting and ending frames, every video frame S[k,s] shall start being displayed at the correct time plus or minus frame_start_tolerance and shall not be displayed after the correct time plus or minus frame_end_tolerance. NOTE: Observations may assume there is no variable frame rate test content.

Alternatively here's the version from Thomas above.

Each sample sample[k,s] with s=1, …, S should be presented at its nominal presentation time T[k,s] relative to the start of the presentation, i.e. TR[k,s] = TR[k,1] + T[k,s] - T[k,1]. A refinement to the accuracy is provided for each media type.

I might suggest two changes to this version;

changing "s=1, ...., S" to "s=2, ....., S", i.e. changing "1" to "2" to take account of the start frame being rendered before playback.
replacing the last sentence with "within the tolerance of +/- frame_presented_tolerance" and adding frame_presented_tolerance to 8.2.3.

jpiesing commented 1 month ago

Looking at V2.1.0, 8.2.5.1. 8.2.5.2 (similarly in 8.3 and 8.4) all have this.

I believe the comment above about changing s=1 to s=2 still applies as the first frame may be presented in advance of its nominal presentation time.

We should consider whether the "2" in "2/framerate" should be parameterised as proposed above, i.e. introducing "frame_presented_tolerance" as a parameter whose default value is 2.

haudiobe commented 1 month ago

TF 2024/08/14

instead of hard coded 2, we have a parameter. 2 will will be the default.
agreed.

cta-wave / device-playback-task-force

new video observation: frame is shown for expected time based on frame rate #119