Matroska track duration not using timestamp of fist frame

JeromeMartinez commented 2 years ago

After https://github.com/MediaArea/MediaInfoLib/pull/1272 Delay field is fine but I remarked that more generally the duration is not using this field so it includes the "empty" duration (which is expected to be excluded).

JeromeMartinez commented 2 years ago

That is confusing, for example, if for aac and eac3 delay of 1024 samples and 256 samples is specified (it is always there), but that does not means there is nothing there: it is silence or garbage.

At least for the moment, MediaInfo relies on the Matroska timestamp of the audio frame which is the timestamp of the first sample, and shows that.

mdhd should have the pre-editlist duration mvhd and tkhd present the post-editlist duration Is there anything like that in mkv?

@robUx4 is there a way in Matroska to say that the x first (or last) samples from a compressed audio frame should be ignored, in order to get the exact expected duration, without the synthetic silence at the beginning and the end?

robUx4 commented 2 years ago

There is the CodecDelay delay element that can represent the amount of samples to discard from the codec. So if you want the exact duration you need to substract this value, and hopefully get the duration of the last Block. Otherwise you need to parse the last block and thus you already know the codec format and probably how much you have to delete from the beginning anyway.

JeromeMartinez commented 2 years ago

@robUx4 maybe we don't talk about the same thing, CodecDelay seems to be for each frame, and internal to the format. What we are looking for is a for saying that the first and last frame have more content than the input. For example, there is an input of 1025 audio samples, compressed with AAC which mandated 1024 samples, we have 2 AAC frames with the last frames having the last 1023 of silence, currently mkvmerge duration is equivalent to 2048 samples, which is wrong (it is important especially for gapless jump to next file). Additionally, we may want to cut the first 1023 audio samples without reencoding (for 2 frames it is not big, but in practice there are more frames, and it is for lossless cut), so real content is 2 audio samples but I don't see with Matroska how to say that, so players think that duration is 2048 samples.

with MP4 it is handled by the media duration = 2048 samples and edit list with media time of 1023 samples and duration of 2 samples, a player is expected to decode 2048 audio samples (limit of the audio format), discard the first and last 1023 samples an play 2 samples. I don't see how to say same to the player with Matroska.

robUx4 commented 2 years ago

Given the samples are at the end, in this example, the BlockDuration may tell how many of the Block data may actually be played. But it's not sample accurate (yet). And that's only at the end, if samples need to be cut at the beginning there is DiscardPadding. Depending on the value is may crop data at the beginning or the end of the BlockGroup. Technically there can only be one so a Block can't be cut at the beginning or the end at the same time, but a BlockDuration could help. They have the same nanosecond "precision". With some rounding error it might be enough to tell which sample it corresponds to in most cases.

JeromeMartinez commented 2 years ago

@robUx4 thank you, DiscardPadding and BlockDuration are exactly what I was looking for, now I need to find sample files and implement the support in MediaInfo.

MediaArea / MediaInfoLib

Matroska track duration not using timestamp of fist frame #1491