MPEGGroup / FileFormat

MPEG file format discussions
24 stars 0 forks source link

ISOBMFF Segment Index #4

Open dwsinger opened 4 years ago

dwsinger commented 4 years ago

In the file containing the SegmentIndexBox, the anchor point for a SegmentIndexBox is the first byte after that box.

subsegment_duration: when the reference is to SegmentIndexBox, this field carries the sum of the subsegment_duration fields in that box; when the reference is to a subsegment, this field carries the difference between the earliest presentation time of any access unit of the reference stream in the next subsegment (or the first subsegment of the next segment, if this is the last subsegment of the segment, or the end presentation time of the reference stream if this is the last subsegment of the stream) and the earliest presentation time of any access unit of the reference stream in the referenced subsegment

anchor point for a SegmentIndexBox is the first byte after that box, does this imply that sidx comes before the media it indexes? In this case, sidx may also be no usable if the duration of a sample is unknown at the time of packaging it.

dwsinger commented 4 years ago

I believe that this is true. We three (at least) possible two courses of action:

  1. say that segment indexes, when in the same file, have to precede the data that they index. (Note that it's possible to reserve space for a segment index, write the segment, and come back and fill it in, in some cases.)
  2. change the documentation of the first_offset field from unsigned to signed. We'd have to trust that no file in existence uses the top bit. Maybe only do this for the 64-bit case?
  3. introduce a pair of new versions that have negative offsets (32-bit and 64-bit).

Unless we're pressed, I would defer this or go with the first.

RufaelDev commented 4 years ago

This problem occurs when streams are dynamic such as in live streaming. In such a case having the index in the beginning is a bit cumbersome as you need to allocate space for that, but you typically don't know how many chunks or segments need to be indexed so it is hard to do it in a reliable manner. So option 1) while possible, does have such a drawback. Option 2 would not have my preference as it would break backward compatibility. Option 3) could be considered if option 1) would turn out insufficient for the dynamic use case. Another option would be to use a separate file for the segmentindex in this case, this is also according to specification, however by adding the proposed text of option 1) would make that less clear.

dwsinger commented 4 years ago

Even if you're not sure how big the segment index will be, you probably know a max, and one can always fill in with a free space box...

RufaelDev commented 4 years ago

Yes we thought of that with the space allocation, but it is tricky rewriting at the beginning and in some cases it is hard or impossible to know a max, especially for 24x7 live streams that can run for a very long time, so option 1) would not be practical for this use case. MPEG DASH only uses the segment index for VoD so perhaps that is the right assessment, but we would be interested in using sidx for live/dynamic content also if it can be supported.

cconcolato commented 4 years ago

sidx was designed with VoD in mind, not for live. In live streaming, the manifest provides the indexing. Can you describe a bit more the use case? Is there already a document?

cconcolato commented 4 years ago

ping @RufaelDev. Can you describe the use case?

RufaelDev commented 4 years ago

Sorry yes we have the following use case.

A live encoder pushes a CMAF Track (CMAF Header + segments) to a packager/origin or storage node e.g. cmaf or cmaf + dash ingest => e.g. interface 1 https://dashif-documents.azurewebsites.net/Ingest/master/DASH-IF-Ingest.html

The receiver stores the CMAF header and appends the segments. And continuously publishes manifests.

At the end of a session (or during the session), indexing structures are created and the content can be accessed rapidly via byte ranges etc and re-used for VoD.

For example if the ingest is based on smooth streaming (fmp4 based) mfra at the end can be used. The mfra can be used to index the segments in the track from the end (which is practical).

Sidx is in the beginning of the file and is therefore less practical. Inserting a segment index at the end once, may be possible in some cases but would still have a lot of overhead as sidx would need to be at the beginning (potentially shifting gigabytes of content), or stored separately which is also not ideal.

The more interesting one is the case a segment index is continuously updated (live segment index) Right now in this use case we use separate technology for this indexing, it would be great if ISOBMFF/CMAF could support this, but it needs a careful design.

Hence i did not reply to this thread as i thought ok segment index would be used for vod. In any case, if this use case is of interest a different segment index v2 may be developed for this.

In that perspective in the last meeting there was a lot discussion on VBR , the argument was that VoD DASH and HLS support VBR as you can see the size of each segment, but segment timeline and number (used in DASH) do not. VBR may be relevant for large scale delivery as it can save bits and another motivation to consider looking at an updated segment index box that could be suitable for live content.

so overall I can accept segment index is used for VoD and a decision is made for that to update this, but there is a live use case that may be of interest to large scale delivery and CMAF/ISOBMFF that could be investigated more. Feel free to contact me or ask more questions.,

cconcolato commented 3 years ago

If I understand correctly the use case, it's about repurposing live content into a single VoD file by generating indexing information after the live event is finished, without having to move bytes around.

It's an interesting use case and I would agree with @dwsinger, reserve enough space at the beginning of the file (with a free box) and replace that space with a sidx box when you have all the data. If for some reasons that's not practical, I would consider using separate files and/or external indexing information.

Pending more information describing why this can't work, my recommendation would be to close this issue.

porcelijn commented 3 years ago

Hi all, I'd like to add my 2 cents. :smile:

I would like to be able to incrementally create an indexed vod archive of a live-linear stream in a (mostly) append-only fashion.

Reserving "enough space at the beginning" for live-linear is a bit of a non-starter. What may work, is to reserve "some" space and fill up the sidx till it's (almost) full and then reserve another byterange to back-fill with segment indices. The first index could point to the second index in daisy-chain fashion. That is, use the final entry in the first sidx as reference_type=1 to point to the space reserved for the the second sidx.

I've spent some time looking for applications and test vectors using nested segment indices (as described by Annex J2.2-2.4), but so far found none in the wild. There is limited support in MP4Box and I'm working on a PoC. But it doesn't look like hierarchical or daisy-chain sidx got a lot of traction so far, probably because the J.2.1 Simple one-level indexing variant makes much more sense in VoD use case.

I did run into some serious limitations in the way sidx with reference_type=1 is defined. The most obvious problem is that 32 and 31 bit limits for (resp) subsegment_duration and referenced_size are simply too small for entries other than media segments. 2^31=2GB may be plenty for a moof+mdat, but if we're referencing a sidx+media chunks for an hour of HD video this is no longer realistic. Similar problems arise if we're trying to store the sum of fragment durations for 8 minutes at 10MHz timescale.

A more subtle problem in the live daisy chaining situation, is that each link chain is supposed to reflect the cumulative duration of the entire chain it links to. In practice, this means that appending a single segment causes duration adjustments in all prior sidxs.

I think the first issue is fundamental. The Annex J.2.2 hierarchical example is nice as a toy example, but even a realistic VoD asset probably cannot use that layout. Increasing the number of bits in referenced_size or subsegment_duration might be an option.

The second problem could be mitigated by allowing for an "unspecified" magic value in subsegment_duration (say 0xFFFFFFFF). For a daisy-chained scheme, that would leave the media chunks valid, since only the duration of the last entry would be unspecified. Also, by following the link in this last entry (possibly recursively) we would still be able to (relatively efficiently) construct the entire index and therefore the cumulative duration. The "unspecified" subsegment_duration would also sidestep the cumulative duration overflow problem.

What are your thoughts on this? (I understand that we're at a point past any major changes, but hope you have anything to add.)

porcelijn commented 3 years ago

Here's an example of a daisy-chained sidx CMAF audio file.

The structure is

In the first two sidxs, there are 81 media entries (reference_type=0) and then the final entry has:

This fits mostly within the ISOBMFF spec. except for the part where I'm using the magic value 0xFFFFFFFF to express that subsegment_duration is unspecified. In this case, unspecified means "unknown", but in the more general case it could also be used for a daisy chain that covers a duration that does not fit in 32 bits.

Using referenced_size to cover only sidx and not its media complies with Annex J.2.3. Using oversized sidx is more convenient than padding with free box, because the free box messes up the sidx's archor point, which would then need to be cancelled out by first_offset.

So, in conclusion, I propose a small amendment: defining a magic value for subsegment_duration to express undefined. What do you think?

cconcolato commented 5 months ago

Thank you for the thorough analysis and example.

I would like to be able to incrementally create an indexed vod archive of a live-linear stream in a (mostly) append-only fashion.

I'm trying to understand the requirements a bit more.

  1. It's not clear to me if you want the incremental file to have stable byte ranges to already appended segments. That seems like a good property for caching, but in your example you say:

    each time a new CMAF fragment is appended to the end of the file, I can locate the "active" sidx adn adjust it in place to insert an extra media entry

Unless you reserve space for that extra entry (e.g. with a free box or with the sidx size being larger than its actual payload), the byte ranges for the segments after that sidx would change.

  1. How will the incremental file be consumed? Is it consumed through a manifest? VoD players typically make a single HTTP request for the header with the sidx, sometimes even documented in the manifest as a single index range. For such incremental file, they would have to walk the file, fetch each sidx one by one to get a complete picture. In one of the comments above, a proposed approach was to reserve as much space as possible in the file, and when that file risks overflow, close it, create a new file and have the manifest point to 2 files. Is that a way to solve your use case?

You raise multiple relevant points:

  1. subsegment_duration and referenced_size can be too small for storage use cases.
  2. daisy-chaining requires knowing the duration of the next segment and you propose to relax that with specifying an unknown duration.

The Technologies under Considerations document (WG03N1110_23477) already includes 2 proposed changes to the sidx box:

  1. defining a new version where reference_count is 32 bits
  2. defining a new version where the first_offset is signed or where the anchor for the first offset is definable, both allowing the sidx to be at the end of the file.

I wonder if these changes are sufficient to address your use case.

porcelijn commented 4 months ago

Hi Cyril,

First, I need to mention that as of this month I'm no longer working for Unified Streaming where I was responsible for the live CMAF archive ingest, storage and egress project. Moving forward, I suggest contacting Mohammed or Arjen (I can share contact details privately if necessary).

In response to your questions,

  1. Indeed, the current implementation where an entry is added to an (oversized) sidx in-place breaks caching for the byte range of the "active" sidx. So far, we have not found this to be a problem in practice. The sidx size being larger than its actual payload ensures the byte ranges of the media segments remain valid as new segments are added at the end.

  2. Yes, it is a genuine limitation of DASH On Demand profile that RepresentationIndex or @indexRange require a single, contiguous byte range. Starting a new file (when sidx is depleted or a gap in the live linear ingest is detected) is one way to tackle this issue, but we chose daisy chaining. I suspect a typical VOD player will not support segmentType=1 entries (nor dynamically refetching sidx) so support for direct playout will inherently be limited. My former employer bridges the gap with an origin that understands how to ingest daisy chained sidx and does egress using DASH live profile, HLS or Smooth.

On the Technologies under Consideration — while both increasing (max) number of entries and signed first_offset are great features, they do not fix some design problems intrinsic to the way segmentType=1 daisy chaining and hierarchical index was originally conceptualized. To be useful for (pre-packaged) VOD, realistically you would want 64 bit subsegment_duration and referenced_size fields in case of segment_type=1. This would be a technical challenge for both adjusting the existing spec and operationally because all sidxs would need adjustment every time a media segment is added. The other option: leave subsegment_duration unspecified (with special marker) make more sense. At least for daisy chaining where the value is just a duplicate aggregated from the next sidx anyway.

Coming back to the question of whether fixing daisy chaining is still necessary with proposed changes. With those, a contiguous (gap free) live linear ingest would put init + media segments first and SegmentIndex at the end (like mfra). Some means of determining the location of this sidx (think mfro or out of band (manifest) byte range) would still be desirable.

Also, this does not solve the practical issue of gaps in the live linear ingest. For reasonably small gaps this might be modeled as a regular sidx entry pointing to a special "empty trun" media segment. Daisy chaining in this case is of course more flexible as earliest_presentation_time is an arbitrary 64 bit value.

In summary, (personally) I'm fine with the amendments on the table though adding something like "When segment_type=1, the subsegment_duration can be set to 0 (or 0xFFFFFFFF) to indicate undefined" would solve most of issues inherent in the design of daisy chained segment indices.

So long, and cheers, Tijn Porcelijn