ISOBMFF Sample durations and Track fragment decode time (tfdt)

dwsinger commented 4 years ago

It's unclear what the defined behavior is (if any) if the tfdt value is less than the computed sum of previous durations. What is the player supposed to do? Perhaps we could truncate a previous duration, but what if whole samples are elided? Should we allow an 'indefinite' value (perhaps 0) for the last sample in a track fragment, and it's resolved by inspecting the tfdt of the next fragment (when it's available)? Is this any better than assigning an arbitrary short value and extending (as currently permitted) or some sort of truncation of an arbitrary long value (as suggested above)?

There is extensive suggested editing in the Defect Report. Are we ready to go?

waqarz commented 4 years ago

w18812 Revised Defect report for ISO/IEC 14496-12 outlines a solution. The discussion on defect report in m47254 at 126th MPEG resolved to: We could add a formal definition of sample duration, to start on 3.6.

There were a few other concerns documented in m47254, namely:

Decoders perceiving two sample at the same decoding time
calculations of bit-rate or frame-rate result in divide-by-zero
Probably due to above software crashes

These are backward compatibility considerations for a legacy client. If a legacy decoder exhibits any one of the above 3 behaviours, it will not be able to present the targeted sparse/low latency track anyway, so the fact that they will stumble decoding such tracks is already known. A zero decoding time signalling would be done in ecosystems to support sparse tracks and with an assurance that the decoders are capable of handling such tracks.

jeanlf commented 4 years ago

I'm not sure I follow the question here.

If the tfdt is less than the sum this is an error of the file, as per spec.
if the tfdt is more the the sum and the last received sample has a non-0 duration this is an error in the file
otherwise file is OK

Do we need to clarify how readers should process non-compliant files ? O r de we want to relax tfdt ??

dwsinger commented 3 years ago

The editorial update in the 7th applies Waqar's changes; close?

jeanlf commented 3 years ago

Yes I think we can close. Maybe for the sake of clarity in 8.8.12.1 we could add after "If no samples were present in the preceding movie and movie fragments...":

"Otherwise, the time expressed in the TrackFragmentBaseMediaDecodeTimeBox shall not be less than the the sum of the sample durations of the samples in the preceding movie and movie fragments."

cconcolato commented 2 years ago

I think the spec is still incorrect or unclear.

1) Reading the 7th edition, 8.8.12.1 says:

The TrackFragmentBaseMediaDecodeTimeBox provides the absolute decoding timestamp, measured on the decoding timeline, of the first sample in decoding order in the track fragment. [...] If the time expressed in the TrackFragmentBaseMediaDecodeTimeBox exceeds the sum of the sample durations of the samples in the preceding movie and movie fragments, then the duration of the last sample preceding this track fragment is extended such that the sum now equals the time given in this box. [...] If no samples were present in the preceding movie and movie fragments for this track, the time expressed in the TrackFragmentBaseMediaDecodeTimeBox defines the decoding timestamp of the first sample in this track.

So it is clearly conformant to have baseMediaDecodeTime be greater than the sum of the previous sample durations. But 8.8.12.3 still says:

baseMediaDecodeTime is an integer equal to the sum of the decode durations of all earlier samples in the media, expressed in the media's timescale.

(I hate that the semantics of boxes are spread out between 2 subclauses: definition and semantics)

I think it should say:

baseMediaDecodeTime is the absolute decoding timestamp, measured on the decoding timeline, of the first sample in decoding order in the track fragment, expressed in the media's timescale. It shall not be smaller than the sum of the decode durations of all earlier samples in the media.

2) Also the way the current text in 8.8.12.1 is written is really meant for file processors that have access to previous and next fragments at the same time. For example, extending the duration of the previous sample (that had small or 0 duration) is only possible when you have the next fragment early enough. In streaming scenario, you may already have entered into a rebuffering phase based on the initial duration by the time you receive the next fragment.

We could (I'm not sure) say something like:

When the time expressed in the TrackFragmentBaseMediaDecodeTimeBox exceeds the sum of the sample durations of the samples in the preceding movie and movie fragments, the behavior of a file processor depends on when fragments are processed:

if the next fragment is available before the last sample of the previous fragment is processed (for example when un-fragmenting a file), then the duration of the last sample preceding this track fragment is (should/shall be?) extended such that the sum now equals the time given in this box.

otherwise (the next fragment becomes available after the last sample of the previous fragment is processed (e.g. already sent to the decoder), then the processor behavior is undefined in this specification.

We also need to say that derived specifications (e.g. CMAF, MSE, ...) may restrict this case or provide guidance.

3) Also the sentence is interesting:

Players may choose to skip over an initial empty media range in tracks where the first decoding timestamp is defined by a TrackFragmentBaseMediaDecodeTimeBox with non-zero time.

IIUC it's impossible to preserve the decode time when unfragmenting this file. In non-fragmented file the first DTS is 0. We should at least indicate it in a note.

jeanlf commented 2 years ago

baseMediaDecodeTime is the absolute decoding timestamp, measured on the decoding timeline, of the first sample in decoding order in the track fragment, expressed in the media's timescale. It shall not be smaller than the sum of the decode durations of all earlier samples in the media.

OK for this

Also the way the current text in 8.8.12.1 is written is really meant for file processors that have access to previous and next fragments at the same time.

No, only to current and previous. I however agree that the text says:

then the duration of the last sample preceding this track fragment is extended such that the sum now equals the time given in this box.

which is not always possible (as you point out, sample might already be in the decoder and its duration can no longer be updated ...). It's not a problem of next fragment being available but a problem of last sample of last fragment being already processed (out of reach for the file parser)

IIUC it's impossible to preserve the decode time when unfragmenting this file. In non-fragmented file the first DTS is 0. We should at least indicate it in a note.

You can always use edits to reconstruct the original timeline.

dwsinger commented 2 years ago

The requirement to not be less than the previous sum, is in the amendment going to CD. are there other aspects, or can this be closed?

cconcolato commented 2 years ago

So the CD covers 1)? I'll check. We should still address 2) and 3) I think.

cconcolato commented 3 months ago

Here is the status in the Revised Text of the 2nd DIS of the 8th edition (MDS23485_WG03_N01118), the semantics of basedMediaDecodeTime are still:

baseMediaDecodeTime is an integer equal to the sum of the decode durations of all earlier samples in the media, expressed in the media's timescale. It does not include the samples added in the enclosing track fragment.

So I still think we should update the semantics to be:

baseMediaDecodeTime is the absolute decoding timestamp, measured on the decoding timeline, of the first sample in decoding order in the track fragment, expressed in the media's timescale. It shall not be smaller than the sum of the decode durations of all earlier samples in the media.

And the comments 2 and 3 in https://github.com/MPEGGroup/FileFormat/issues/3#issuecomment-948067856 have not been addressed either.

mhannuksela commented 3 months ago

I agree on the proposed semantics change in principle, although the second sentence needs some wordsmithing IMO. Here is a try:

baseMediaDecodeTime is the absolute decoding timestamp, measured on the decoding timeline, of the first sample in decoding order in the track fragment, expressed in the media's timescale. The value of baseMediaDecodeTime shall be greater than or equal to the sum of the sample durations of all the samples of this track that precede this track fragment in decoding order.

MPEGGroup / FileFormat

ISOBMFF Sample durations and Track fragment decode time (tfdt) #3