Dash-Industry-Forum / DASH-IF-IOP

DASH-IF Interoperability Points issue tracker and document source code
31 stars 7 forks source link

Thumbnails with still images #119

Closed haudiobe closed 6 years ago

haudiobe commented 7 years ago

Submitter: Will Law DASH IF position to date has been to implement thumbnails for UI scrubbing via trick mode adaption sets. This is actually quite complex for a player to manage. It has to extract still images from a video, or else superimpose a second video surface over the first. This requires two player instances to be active, which doubles the

A simpler implementation would be to describe a series of thumbnail images. These can be handled easily by the player. They can also be described via the existing template structure, after first being signaled via a new essential descriptor. This approach has been suggested by Joey Parrish from Google and modified by Will Law . See this thread: https://github.com/google/shaka-player/issues/559#issuecomment-260706989

Here is a sample implementation:

http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_thumbnails.mpd

This sample content has frame and timecode burnt in which is handy for checking positional consistency.

LloydW93 commented 7 years ago

We've done a similar thing using tiles, for example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:dvb="urn:dvb:dash-extensions:2014-1" type="static" profiles="urn:dvb:dash:profile:dvb-dash:2014,urn:dvb:dash:profile:dvb-dash:isoff-ext-live:2014">
    <Period duration="PT58M11S" start="PT0S">
        <AdaptationSet id="1" mimeType="image/jpeg" contentType="image">
            <BaseURL>/thumbnail_v1/2bfd42-b08302hx/</BaseURL>
            <SegmentTemplate startNumber="0" timescale="3491" duration="87275" media="$RepresentationID$/vf_b08302hx_SB$Number%04d$-401981f31123.jpg"/>
            <Representation id="thumbnail_224x126" height="126" width="224">
                <EssentialProperty schemeIdUri="tag:bbc.co.uk,2016-01-21:/mediaservices/thumbnail-layout/rows" value="5"/>
                <EssentialProperty schemeIdUri="tag:bbc.co.uk,2016-01-21:/mediaservices/thumbnail-layout/columns" value="5"/>
            </Representation>
        </AdaptationSet>
        <AdaptationSet id="2" mimeType="image/jpeg" contentType="image">
            <BaseURL>/thumbnail_v1/2bfd42-b08302hx/</BaseURL>
            <SegmentTemplate startNumber="0" timescale="3491" duration="87275" media="$RepresentationID$/vf_b08302hx_SB$Number%04d$-401981f31123.jpg"/>
            <Representation id="thumbnail_192x108" height="108" width="192">
                <EssentialProperty schemeIdUri="tag:bbc.co.uk,2016-01-21:/mediaservices/thumbnail-layout/rows" value="5"/>
                <EssentialProperty schemeIdUri="tag:bbc.co.uk,2016-01-21:/mediaservices/thumbnail-layout/columns" value="5"/>
            </Representation>
        </AdaptationSet>
        <AdaptationSet id="3" mimeType="image/jpeg" contentType="image">
            <BaseURL>/thumbnail_v1/2bfd42-b08302hx/</BaseURL>
            <SegmentTemplate startNumber="0" timescale="3491" duration="87275" media="$RepresentationID$/vf_b08302hx_SB$Number%04d$-401981f31123.jpg"/>
            <Representation id="thumbnail_128x72" height="72" width="128">
                <EssentialProperty schemeIdUri="tag:bbc.co.uk,2016-01-21:/mediaservices/thumbnail-layout/rows" value="5"/>
                <EssentialProperty schemeIdUri="tag:bbc.co.uk,2016-01-21:/mediaservices/thumbnail-layout/columns" value="5"/>
            </Representation>
        </AdaptationSet>
    </Period>
</MPD>
TobbeEdgeware commented 7 years ago

@LloydW93 We are discussing this at the DASH-IF f2f. I wonder, how does the timing work in your example. Do you have some duration for each thumbnail? How do you apply this for live?

TobbeEdgeware commented 7 years ago

@LloydW93 Sorry, I see now that you've got the duration. As far as I understand, you make a tile every 25s and each tile is 25 pictures, so there is one thumbnail per second.

I got the task at the ongoing DASH-IF f2f meeting to lead the work on proposing a common approach to this type of thumbnails. Maxdome has also expressed interest in tiling, I think we should have the possibility to support that and your solution looks like a good starting point.

With some major addition like adding bandwidth to the representation, both proposals above go through the MPD validator http://www-itec.uni-klu.ac.at/dash/?page_id=605#

The DASH-IF validator http://dashif.org/conformance.html complains that the mime type is not video, audio, or subtitles, though, but that should be possible to update.

TobbeEdgeware commented 7 years ago

I'll put up a place where we can discuss this in more detail, but here is a combination of Joey Parish and BBC proposals. It uses tiles, but could be simply reduced to 1 single picture per URL by specifying grid = 1x1.

<AdaptationSet id="3" mimeType="image/jpeg" contentType="image">
       <SegmentTemplate media="$RepresentationID$/tile$Number$.jpg" timescale="1" duration="125" startNumber="1"/>
            <Representation bandwidth="10000" id="thumbnails" width="6400" height="180">
                <EssentialProperty schemeIdUri="http://dashif.org/guidelines/thumbnail_tile_grid" value="25x1"/>
          </Representation>
    </SegmentationTemplate>
</AdaptationSet>
LloydW93 commented 7 years ago

Sorry, didn't see your first mention. But you've come to the right answer :). We will have thumbnails up to one a second - some of our media will have them more frequently based on business logic. So the key thing is that the appropriate duration/timescale values are there.

TobbeEdgeware commented 7 years ago

@LloydW93 Thanks for your clarification. Is there any particular reasoning behind using different adaptation sets for the different sizes? Since durations are the same, it seems appropriate to put them into one AS.

LloydW93 commented 7 years ago

I've done some asking around and I've yet to find someone who knows why we did it like that! I agree that one AdaptationSet is plenty.

TobbeEdgeware commented 7 years ago

I made a proposal (or rather 2) following the presentation at the dash.js f2f in December. They are available at https://github.com/Dash-Industry-Forum/DASH-IF-IOP/blob/master/thumbnails/README.md. You can comment here, or make a pull-request if there is something you want to have changed.

wilaw commented 7 years ago

@TobbeEdgeware - nice clean proposal. Question regarding the @bandwidth attribute in https://github.com/Dash-Industry-Forum/DASH-IF-IOP/blob/master/thumbnails/dash_image_adaptation_set.md . How should it be interpreted? Is it the bandwidth necessary to retrieve the entire tile within one thumbnail duration? Can it be removed completely since it has no direct bearing on the thumbs and is an artifact of reusing an adaption set designed for streaming video? A static attribute like "size" in MB would be more useful to the player in deciding when to load the tile.

TobbeEdgeware commented 7 years ago

@wilaw I agree that bandwidth is not quite natural, but it is mandatory in the XML schema for DASH, so I'd like to have it in order to get the manifest go through the validator. I think we should clarify that it is defined as the average bitrate when playing at normal speed. That is then bandwidth = average_tilesize_in_bits/tile_duration, which is analogous to how the bandwidth is calculated for other media.

TobbeEdgeware commented 7 years ago

@wilaw. I updated the proposal with a bandwidth explanation and a small calculation. Checking with a practical example, I need to update the bandwidth to 30kbps for the resolution I specified. When working through the example, I realized that it is pretty nice to be able to compare the bandwidth of the thumbnails to that of the other media. If the client wants to calculate the average size in bytes of a tile, it can always do the same calculation as other media segments.

wilaw commented 7 years ago

@TobbeEdgeware – thanks for that. I think the calculations are a bit off however.

bandwidth is average_tile_size_in_bits/duration. In the example about, the average_tile_size_in_bytes would be 458kB (458*1024B/8/125s = 30015kbps)

I made a 6400x180 thumb – it is 126kB as a jpeg with high compression, not 458kB. Also you should multiply kBytes by 8 to get kBits, not divide. I think that sentence should be rewritten as

bandwidth is average_tile_size_in_bits/duration. In the example about, the average_tile_size_in_bytes might be 126kB . Therefore the bandwidth value would be (126kB 8 1024)/125s = 8257bps.

This is considerably smaller than the 30Mbps indicated in the example

Cheers

Will

TobbeEdgeware commented 7 years ago

@wilaw Thanks for the pointing out the two errors in the written calculation. The manifest bitrate of 30000 reflected the input value of 458kB, but it could be considered high. Exactly how much to compress the thumbnail is of course a choice of the service provider, but I'm happy to correct the calculation and use your values in both the example text and the adaptation set.

wilaw commented 7 years ago

@TobbeEdgeware - I made a working sample manifest with tiled thumbs per your proposal:

http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_tiled_thumbnails.mpd

TobbeEdgeware commented 7 years ago

@wilaw Thanks for the MPD. It doesn't go through the validator at http://www-itec.uni-klu.ac.at/dash/?page_id=605 because the id of the AdaptationSet must be an unsignedInt according to the XML Schema in the standard. The idof the Representation does not have the same limitation. My example uses id="3" for the AdaptationSet and id="thumbnails" for the Representation, so that should be OK.

waqarz commented 7 years ago

Seems like the MPD is changing as we speak :) the id is now changed.

wilaw commented 7 years ago

Yes, I just fixed this ;)

I verified with the itec validator and uploaded in place. It may take 20min for the cache to clear on some servers and show the new file .

The dash-if validator still says the Adaption set mime-type is invalid, but this is expected, as contentType="image" is not yet standardized.

Cheers

Will

From: waqarz notifications@github.com Reply-To: Dash-Industry-Forum/DASH-IF-IOP reply@reply.github.com Date: Tuesday, February 7, 2017 at 8:46 AM To: Dash-Industry-Forum/DASH-IF-IOP DASH-IF-IOP@noreply.github.com Cc: "Law, Will" wilaw@akamai.com, Mention mention@noreply.github.com Subject: Re: [Dash-Industry-Forum/DASH-IF-IOP] Thumbnails with still images (#119)

Seems like the MPD is changing as we speak :) the id is now changed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Dash-2DIndustry-2DForum_DASH-2DIF-2DIOP_issues_119-23issuecomment-2D278059952&d=DwMCaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=KkevKJerDHRF9WRs8nW8Ew&m=b93qo8-iy-ceiuWH6qud-7-EzLvLd3FZf4Guvh5pbgQ&s=N3YwbJrp5TxE8MzFZczadqUzc3aKdah9x2nBBo-J4_0&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AComCs8zkAaXUyOGCPl2Y3CBIqRj8cBWks5raJ-5FTgaJpZM4K5uJI&d=DwMCaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=KkevKJerDHRF9WRs8nW8Ew&m=b93qo8-iy-ceiuWH6qud-7-EzLvLd3FZf4Guvh5pbgQ&s=fuu0hiflnouBRP58VxaYuiyBJORxBeRPpDEAHTibzfI&e=.

eliesader commented 7 years ago

Hi

A question regarding the main driver behind this improvement. Is there any evidence that the existing method are indeed complex ?

One would think that decoding an "I-Frame track" (for lack of better wording) and decoding a JPEG image are analogous in complexity and do not require a second video decoder ?

The reason behind the comment is that of overall system simplicity

Thanks Elie

TobbeEdgeware commented 7 years ago

@eliesader I-frame tracks are also video tracks and need a video decoder. In any case, a trick mode (see DASH-IF IOP) mode using trick-mode video representations is still available if someone wants to use that.

haudiobe commented 6 years ago

added to v4.06

hurdlea commented 6 years ago

Has there been any consideration to using SegmentTimeline with time values for live streaming scenarios? I noticed that the proposal only caters for numbering within a SegmentTemplate. As a operator who has a large live streaming service we are looking for a solution to seeking for nPVR and Start-Over.

wilaw commented 6 years ago

You can still use SegmentTemplate to describe the thumbnails even if you use SegmentTimeline to describe your audio and video tracks for your live service. Each adaptionSet can use a different addressing scheme. We thought the simplicity afforded by SegmentTemplate for thumbnails which occur at a fixed cadence was worth it. SegmentTimeline is verbose, but it allows each variation in segment duration to be described. This precision was not needed for thumbnails occurring at a fixed cadence.

haudiobe commented 6 years ago