Open dwsinger opened 4 years ago
I believe that this is true. We three (at least) possible two courses of action:
Unless we're pressed, I would defer this or go with the first.
This problem occurs when streams are dynamic such as in live streaming. In such a case having the index in the beginning is a bit cumbersome as you need to allocate space for that, but you typically don't know how many chunks or segments need to be indexed so it is hard to do it in a reliable manner. So option 1) while possible, does have such a drawback. Option 2 would not have my preference as it would break backward compatibility. Option 3) could be considered if option 1) would turn out insufficient for the dynamic use case. Another option would be to use a separate file for the segmentindex in this case, this is also according to specification, however by adding the proposed text of option 1) would make that less clear.
Even if you're not sure how big the segment index will be, you probably know a max, and one can always fill in with a free space box...
Yes we thought of that with the space allocation, but it is tricky rewriting at the beginning and in some cases it is hard or impossible to know a max, especially for 24x7 live streams that can run for a very long time, so option 1) would not be practical for this use case. MPEG DASH only uses the segment index for VoD so perhaps that is the right assessment, but we would be interested in using sidx for live/dynamic content also if it can be supported.
sidx was designed with VoD in mind, not for live. In live streaming, the manifest provides the indexing. Can you describe a bit more the use case? Is there already a document?
ping @RufaelDev. Can you describe the use case?
Sorry yes we have the following use case.
A live encoder pushes a CMAF Track (CMAF Header + segments) to a packager/origin or storage node e.g. cmaf or cmaf + dash ingest => e.g. interface 1 https://dashif-documents.azurewebsites.net/Ingest/master/DASH-IF-Ingest.html
The receiver stores the CMAF header and appends the segments. And continuously publishes manifests.
At the end of a session (or during the session), indexing structures are created and the content can be accessed rapidly via byte ranges etc and re-used for VoD.
For example if the ingest is based on smooth streaming (fmp4 based) mfra at the end can be used. The mfra can be used to index the segments in the track from the end (which is practical).
Sidx is in the beginning of the file and is therefore less practical. Inserting a segment index at the end once, may be possible in some cases but would still have a lot of overhead as sidx would need to be at the beginning (potentially shifting gigabytes of content), or stored separately which is also not ideal.
The more interesting one is the case a segment index is continuously updated (live segment index) Right now in this use case we use separate technology for this indexing, it would be great if ISOBMFF/CMAF could support this, but it needs a careful design.
Hence i did not reply to this thread as i thought ok segment index would be used for vod. In any case, if this use case is of interest a different segment index v2 may be developed for this.
In that perspective in the last meeting there was a lot discussion on VBR , the argument was that VoD DASH and HLS support VBR as you can see the size of each segment, but segment timeline and number (used in DASH) do not. VBR may be relevant for large scale delivery as it can save bits and another motivation to consider looking at an updated segment index box that could be suitable for live content.
so overall I can accept segment index is used for VoD and a decision is made for that to update this, but there is a live use case that may be of interest to large scale delivery and CMAF/ISOBMFF that could be investigated more. Feel free to contact me or ask more questions.,
If I understand correctly the use case, it's about repurposing live content into a single VoD file by generating indexing information after the live event is finished, without having to move bytes around.
It's an interesting use case and I would agree with @dwsinger, reserve enough space at the beginning of the file (with a free
box) and replace that space with a sidx
box when you have all the data. If for some reasons that's not practical, I would consider using separate files and/or external indexing information.
Pending more information describing why this can't work, my recommendation would be to close this issue.
Hi all, I'd like to add my 2 cents. :smile:
I would like to be able to incrementally create an indexed vod archive of a live-linear stream in a (mostly) append-only fashion.
Reserving "enough space at the beginning" for live-linear is a bit of a non-starter. What may work, is to reserve "some" space and fill up the sidx
till it's (almost) full and then reserve another byterange to back-fill with segment indices. The first index could point to the second index in daisy-chain fashion. That is, use the final entry in the first sidx
as reference_type=1
to point to the space reserved for the the second sidx
.
I've spent some time looking for applications and test vectors using nested segment indices (as described by Annex J2.2-2.4), but so far found none in the wild. There is limited support in MP4Box and I'm working on a PoC. But it doesn't look like hierarchical or daisy-chain sidx
got a lot of traction so far, probably because the J.2.1 Simple one-level indexing variant makes much more sense in VoD use case.
I did run into some serious limitations in the way sidx
with reference_type=1
is defined. The most obvious problem is that 32 and 31 bit limits for (resp) subsegment_duration
and referenced_size
are simply too small for entries other than media segments. 2^31=2GB may be plenty for a moof
+mdat
, but if we're referencing a sidx
+media chunks for an hour of HD video this is no longer realistic. Similar problems arise if we're trying to store the sum of fragment durations for 8 minutes at 10MHz timescale.
A more subtle problem in the live daisy chaining situation, is that each link chain is supposed to reflect the cumulative duration of the entire chain it links to. In practice, this means that appending a single segment causes duration adjustments in all prior sidx
s.
I think the first issue is fundamental. The Annex J.2.2 hierarchical example is nice as a toy example, but even a realistic VoD asset probably cannot use that layout. Increasing the number of bits in referenced_size
or subsegment_duration
might be an option.
The second problem could be mitigated by allowing for an "unspecified" magic value in subsegment_duration
(say 0xFFFFFFFF
). For a daisy-chained scheme, that would leave the media chunks valid, since only the duration of the last entry would be unspecified. Also, by following the link in this last entry (possibly recursively) we would still be able to (relatively efficiently) construct the entire index and therefore the cumulative duration. The "unspecified" subsegment_duration
would also sidestep the cumulative duration overflow problem.
What are your thoughts on this? (I understand that we're at a point past any major changes, but hope you have anything to add.)
Here's an example of a daisy-chained sidx CMAF audio file.
The structure is
sidx
)sidx
)In the first two sidx
s, there are 81 media entries (reference_type=0
) and then the final entry has:
reference_type=1
referenced_size
covers only the next sidx
byterangesubsegment_duration=0xFFFFFFFF
flags
are essentially useless here, but since I'm referring to sync samples I'm using the same as for media
I can build this structure progressively. That is, each time a new CMAF fragment is appended to the end of the file, I can locate the "active" sidx
adn adjust it in place to insert an extra media entry. If the "active" sidx
is about to "overflow", I insert an index entry that points to a byterange at the end of the file where I'll put the next active sidx
.This fits mostly within the ISOBMFF spec. except for the part where I'm using the magic value 0xFFFFFFFF
to express that subsegment_duration
is unspecified. In this case, unspecified means "unknown", but in the more general case it could also be used for a daisy chain that covers a duration that does not fit in 32 bits.
Using referenced_size
to cover only sidx
and not its media complies with Annex J.2.3.
Using oversized sidx
is more convenient than padding with free
box, because the free box messes up the sidx
's archor point, which would then need to be cancelled out by first_offset
.
So, in conclusion, I propose a small amendment: defining a magic value for subsegment_duration
to express undefined. What do you think?
Thank you for the thorough analysis and example.
I would like to be able to incrementally create an indexed vod archive of a live-linear stream in a (mostly) append-only fashion.
I'm trying to understand the requirements a bit more.
each time a new CMAF fragment is appended to the end of the file, I can locate the "active" sidx adn adjust it in place to insert an extra media entry
Unless you reserve space for that extra entry (e.g. with a free box or with the sidx size being larger than its actual payload), the byte ranges for the segments after that sidx
would change.
sidx
, sometimes even documented in the manifest as a single index range. For such incremental file, they would have to walk the file, fetch each sidx
one by one to get a complete picture.
In one of the comments above, a proposed approach was to reserve as much space as possible in the file, and when that file risks overflow, close it, create a new file and have the manifest point to 2 files. Is that a way to solve your use case?You raise multiple relevant points:
subsegment_duration
and referenced_size
can be too small for storage use cases.The Technologies under Considerations document (WG03N1110_23477) already includes 2 proposed changes to the sidx
box:
reference_count
is 32 bitssidx
to be at the end of the file.I wonder if these changes are sufficient to address your use case.
Hi Cyril,
First, I need to mention that as of this month I'm no longer working for Unified Streaming where I was responsible for the live CMAF archive ingest, storage and egress project. Moving forward, I suggest contacting Mohammed or Arjen (I can share contact details privately if necessary).
In response to your questions,
Indeed, the current implementation where an entry is added to an (oversized) sidx
in-place breaks caching for the byte range of the "active" sidx
. So far, we have not found this to be a problem in practice. The sidx
size being larger than its actual payload ensures the byte ranges of the media segments remain valid as new segments are added at the end.
Yes, it is a genuine limitation of DASH On Demand profile that RepresentationIndex
or @indexRange
require a single, contiguous byte range. Starting a new file (when sidx
is depleted or a gap in the live linear ingest is detected) is one way to tackle this issue, but we chose daisy chaining.
I suspect a typical VOD player will not support segmentType=1
entries (nor dynamically refetching sidx
) so support for direct playout will inherently be limited. My former employer bridges the gap with an origin that understands how to ingest daisy chained sidx
and does egress using DASH live profile, HLS or Smooth.
On the Technologies under Consideration — while both increasing (max) number of entries and signed first_offset
are great features, they do not fix some design problems intrinsic to the way segmentType=1
daisy chaining and hierarchical index was originally conceptualized. To be useful for (pre-packaged) VOD, realistically you would want 64 bit subsegment_duration
and referenced_size
fields in case of segment_type=1
. This would be a technical challenge for both adjusting the existing spec and operationally because all sidx
s would need adjustment every time a media segment is added. The other option: leave subsegment_duration
unspecified (with special marker) make more sense. At least for daisy chaining where the value is just a duplicate aggregated from the next sidx
anyway.
Coming back to the question of whether fixing daisy chaining is still necessary with proposed changes. With those, a contiguous (gap free) live linear ingest would put init + media segments first and SegmentIndex at the end (like mfra). Some means of determining the location of this sidx
(think mfro
or out of band (manifest) byte range) would still be desirable.
Also, this does not solve the practical issue of gaps in the live linear ingest. For reasonably small gaps this might be modeled as a regular sidx
entry pointing to a special "empty trun
" media segment. Daisy chaining in this case is of course more flexible as earliest_presentation_time
is an arbitrary 64 bit value.
In summary, (personally) I'm fine with the amendments on the table though adding something like "When segment_type=1
, the subsegment_duration
can be set to 0 (or 0xFFFFFFFF
) to indicate undefined" would solve most of issues inherent in the design of daisy chained segment indices.
So long, and cheers, Tijn Porcelijn
anchor point for a
SegmentIndexBox
is the first byte after that box, does this imply that sidx comes before the media it indexes? In this case, sidx may also be no usable if the duration of a sample is unknown at the time of packaging it.