ffmpeginteropx / FFmpegInteropX

FFmpeg decoding library for Windows 10 UWP and WinUI 3 Apps
Apache License 2.0
205 stars 52 forks source link

Duration does not Update with HLS Live Streams #411

Open softworkz opened 6 months ago

softworkz commented 6 months ago

There are so many duration properties...

                var d1 = this.FfmpegMss.FormatInfo.Duration;
                var d2 = this.FfmpegMss.Duration;
                var d3 = this.FfmpegMss.GetMediaStreamSource().Duration;
                var d4 = this.FfmpegMss.PlaybackItem.StartTime;
                var d5 = this.FfmpegMss.PlaybackItem.Source.Duration;
                var d6 = this.FfmpegMss.PlaybackItem.Source.MediaStreamSource.Duration;
                var d7 = this.FfmpegMss.PlaybackSession?.NaturalDuration;
                var d8 = this.FfmpegMss.PlaybackSession?.MediaPlayer.TimelineController.Duration;

Unfortunately all are zero when using an HLS live stream (where the duration is continuously expanding). Do you have any idea how to get accurate duration values in this case?

I haven't tried the Windows.Media adaptive streaming source, but then I would loose all FFmpegInteropX functionality, right?

brabebhin commented 6 months ago

I haven't tried the Windows.Media adaptive streaming source, but then I would loose all FFmpegInteropX functionality, right?

it would, although recently we found a way to integrate with it, inspired by Microsoft (not implemented)

Does the stream actually play? As far as I remember, we were bumping the duration property every now and again on live streams as we were decoding new samples.

softworkz commented 6 months ago

Does the stream actually play?

Yes, it plays fine and MPV player updates the duration nicely.

As far as I remember, we were bumping the duration property every now and again on live streams as we were decoding new samples.

Sounds good when it would happen.. ;-)

lukasf commented 6 months ago

Not sure what you are trying to get @softworkz. A live stream by definition does not have a duration. I think you can get the current playback position from the MediaPlayer's PlaybackSession, if that is what you need.

softworkz commented 6 months ago

A live stream by definition does not have a duration

Well - philosophically rather not, but there is a duration, which is the total range of available segments in the playlist. For the simpler case where no older segments are dropping out, that duration is increasing by each added segment.

Duration is important so that you know (by displaying) within which ranges you are able to perform seeking, especially when you are presenting a timeline which is based on wall-clock time and/or bounded by chapters or program events/shows (TV).

softworkz commented 6 months ago

For illustration - it's about the blue range on the timeline.

image

brabebhin commented 6 months ago

I think the seekable ranges thing is a tad more complicated than that, I'd assume there's some buffering involved that allows seeking back. The conundrum here I think is that MPE is no longer seekable if duration is 0.

brabebhin commented 6 months ago

Try the property AutoExtendDuration in configuration, this should turn on auto extending duration.

softworkz commented 6 months ago

I think the seekable ranges thing is a tad more complicated than that,

Not really. I mean it's not trivial to present it correctly, but from the player side, it's all about getting an up-to-date duration.

I'd assume there's some buffering involved that allows seeking back

HLS works with segments (like 3 seconds each), which allows to avoid doing excessive buffering. The player just loads the playlist (again and again and again...) to know about the streeam. It doesn't need to load any data.

The conundrum here I think is that MPE is no longer seekable if duration is 0

It is still seekable. But when you seek to a point outside of the valid range, then it hangs for 0.5-2s, which is another reason why this needs to be known.

Try the property AutoExtendDuration in configuration, this should turn on auto extending duration.

Oh thanks, sounds promising, I'll try!

Any other properties that might need to be set differently for live streams?

softworkz commented 6 months ago

AutoExtendDuration

Where is that?

brabebhin commented 6 months ago

In MediaSourceConfig, and if you use the winui branch, in the "General" section.

But by looking at the code, this should be true already, so it might not work. Some other interesting properties are ReadAheadBufferEnabled, SkipErrors, ReadAheadBufferSize, ReadAheadBufferDuration. They are all in the sample config class.

softworkz commented 6 months ago

image

brabebhin commented 6 months ago

Ooups, it is not in the IDL. Having to manually edit the IDL was a problem waiting to happen.

softworkz commented 6 months ago

The really annoyance of all that is not the IDL editing - it's that you need to always (when manually) need to make the same change at 5 different places and make no mistakes...I really don't like that, it just slows you down.

brabebhin commented 6 months ago

Yeah, this is why we resisted migrating to C++/winRT for as long as we could...

brabebhin commented 6 months ago

The curious thing is the property is in IDL on the winUI branch.

softworkz commented 6 months ago

I haven't picked it up. It was added after I had forked:

image

(yellow is mine, light blue is where it's been re-added)

softworkz commented 6 months ago

Damn - all for nothing. AutoExtendDuration is true by default, LOL

brabebhin commented 6 months ago

Yeah, I feared as much. This might be a bug. If you can provide to me a HLS test link, I'll look into it. This should be the scenario for that property in the first place.

The property has been in on master for a long time. It is even used in c++ code. Just not in IDL. I probably picked it up when i refactored the config.

Is that GitExtensions that you're using?

softworkz commented 6 months ago

Is that GitExtensions that you're using?

The screenshot? That's SmartGit.

lukasf commented 6 months ago

AutoExtendDuration does not help you. It is only used in seekable streams which do have a duration, not in live streams. It's also a "stupid" solution, just extending the duration by 10 seconds each time playback goes over the end time. And as you found out, it's enabled by default.

I don't think that there is a way to get the required information from ffmpeg. The HLS/DASH support in ffmpeg is generally pretty poor. Very slow playback start and little control over what happens during playback. Also not enough information for seamless video stream switching. For full features HLS/DASH support, we'd need a custom stream parser, which would be quite a lot of work, due to the multitude of possibilities how these streams can be constructed.

softworkz commented 6 months ago

AutoExtendDuration does not help you.

Right, it doesn't.

But there must be a way, because MPV player uses the HLS demuxer from ffmpeg (I had added improved VTT subtitles support recently - for MPV). The way MPV does it is to set the start of the playlist to zero (it does that for all playback by default) and then it extends the duration while the HLS live playlist grows.

Other players follow a different philosophy, by saying that a live stream cannot have a duration and set it to zero. Then they provide the playlist range in form of a different API. For Windows.Media, there's the GetSeekableRanges API (alongside GetBufferedRanges), which is the same way as HTMLVideoElement provides in browser engines.

It doesn't matter whether it's one way or the other, but it's crucial to get this information in some way, because without it, you cannot provide proper timeline display and seeking control in such streams.

brabebhin commented 6 months ago

I suppose the GetSeekableRanges API is controlled by

https://learn.microsoft.com/en-us/uwp/api/windows.media.core.mediastreamsource.setbufferedrange?view=winrt-22621#windows-media-core-mediastreamsource-setbufferedrange(windows-foundation-timespan-windows-foundation-timespan)

We could integrate our read-ahead buffer with this. However, the read ahead buffer only buffers ahead. In order to get some useful back seeking functionalities, we would need to also keep some back buffer.

softworkz commented 6 months ago

I suppose the GetSeekableRanges API is controlled by

No, that's for buffered ranges.

brabebhin commented 6 months ago

I'd hazard a guess and assume those are the same for live streams, but could be wrong. Another way we can do this would be to implement the stream handler (like in MS did: https://github.com/microsoft/FFmpegInterop/pull/305). This would allow us to indirectly feed into the AdaptiveMediaStreamSource, which theoretically should allow us to handle DASH using the windows.media APIs.

I have no idea if this would work, but supporting the byte stream handler shouldn't be too difficult. I also don't know how this would work with the various APIs that we expose through FFmpegMediaSource (like effects, subtitles).

softworkz commented 6 months ago

I'd hazard a guess and assume those are the same for live streams, but could be wrong.

To disambiguate the two:

Buffered Ranges

These are the time ranges for which content has been downloaded and can be played without further network (I/O) requests.

Seekable Ranges

Typically there's just a single such range. It indicates the time range for which content is available ("can be downloaded").

The specs - e.g. HLS - are provisioning for cases of discontinuities or interruptions, which could be reflected by more than a single "seekable range".

softworkz commented 6 months ago

Another way we can do this would be to implement the stream handler (like in MS did: microsoft/FFmpegInterop#305). This would allow us to indirectly feed into the AdaptiveMediaStreamSource, which theoretically should allow us to handle DASH using the windows.media APIs.

Yea, I had thought of that, but I'm not sure how easy/difficult that would be.

I think the least involved way would be to get this information somehow from ffmpeg. In the worst case, accessing the HLS demuxer directly, but MPV doesn't seem to do that. I haven't found out yet how they are determining the duration. Maybe it's normally available from the demuxer and FFmpegInteropX is just not regularly reading and updating it?

brabebhin commented 6 months ago

It seems duration is exposed in AVStream and AVFormatContext. But I don't have any "live" URLs to check with. The URLs I found all report the right duration from the start.

softworkz commented 6 months ago

Here are some you can use: https://www.harryshomepage.de/webtv.html

brabebhin commented 6 months ago

Thanks. I'll look at this over the weekend.

lukasf commented 6 months ago

Duration is not set in FFmpeg for live streams, and it would also be logically wrong to set a duration since there is no duration.

The only way how I currently see this supported is by using the ReadAheadBuffer, and adding APIs to query the last position that is being buffered in the two active playback streams. It could be that MPV uses a similar approach, since as I said, I do not see any API support for this in ffmpeg. When I implemented the buffer, I was planning to add an API to get the buffer state, with buffer size and duration for both of the current streams. But then I did not really see any real use for it, so it's not there yet. It should not be too hard to implement. Check IsFull method in StreamBuffer, all the data is eaasily available.

I also tried setting the BufferedRange on the MediaStreamSource once, hoping we would see the buffered range in the seek bar or something like that, but I did not see any effect. I just saw in the docs that it is rather used for power saving.

softworkz commented 6 months ago

I dug into this deeper today and I realized that my observations were based on an illusion. MPV's duration is merely indicating the buffer duration which is growing constantly while reading new segments, but not the playlist range, which can be seen when you start with a playlist which already has a substantial number of segments. The ones that I had tested always started empty.

You can indicate to the HLS demuxer where you want to start, but then, you can't seek to a point earlier than the starting point and you can't seek to a point in the future without reading all segments in between. I had worked on these things (for live tv playback) two years ago and mixed up memory about mpv with another player.

This is really unfortunate, because now I have to solve this for two players. I'm yet undecided about the best way. One idea would be to read the same playlist in parallel with separate (C#) code. I've already done an implementation for that which could be re-used. Most appealing in this case: it would be a single solution for both cases. The other way would be to implement it for the players directly, but that might mean to do changes at three places: ffmpeg, MPV and FFmpegInteropX. Puh....

brabebhin commented 6 months ago

Yeah I've done some research on how ffmpeginteropx behaves with some of the URLs you provided, and the illusion theory also crossed my mind.

I guess the best I can do is expose the read ahead buffer size though some API in the FFmpegMediaSource, as @lukasf suggested.

At this point, one would need custom transport controls to make use of the API.

softworkz commented 6 months ago

I guess the best I can do is expose the read ahead buffer size though some API in the FFmpegMediaSource, as @lukasf suggested. At this point, one would need custom transport controls to make use of the API.

Well, that should actually be exposed via the GetBufferedRanges API. But it doesn't help my case. I guess I'm all alone with this task and even when I would find a good way it would be nothing for FFmpegInteropX because it can't take code which depends on custom ffmpeg modifications,,,

Thanks a lot for your help and advice!

brabebhin commented 6 months ago

Yes, unfortunately, we can't support custom ffmpeg builds that easy.

Best chance would be to get your patch merged into ffmpeg.

lukasf commented 6 months ago

That's exactly how I imagined it to work in MPV.

Don't you think that the buffered range would help you to some degree? When playing a live stream, playback is usually only a few seconds behind what's available. Since our ReadAheadBuffer will read all data that's available (until buffer limit), you will get more or less precise information on the seekable range, until you hit the (configurable) buffer limit. And it will allow you to perform very fast seeks into that range, since we won't restart playback (like in a normal seek) but we directly skip into the pre-buffered packages and continue from there.

But of course, it only gives info about the forward-seekable range. We do not have a back buffer currently (which would theoretically be possible ofc).

softworkz commented 6 months ago

Don't you think that the buffered range would help you to some degree?

Yes, it helps to that degree that you can determine whether a seek can be performed without restarting playback - if the same applies like for MPV, i.e. that you can't seek to a point outside (future) of the buffer without reading all segments in between.

When playing a live stream, playback is usually only a few seconds behind what's available.

Yes, that's how it typically starts. Let's assume it starts like that and it plays for two hours. Then the user seeks back 1h. The player cannot seek back to that point, so we need to restart playback at 0:01:00 and we're right in the middle of the playlist. At least MPV is unable to seek forward then beyond the buffer. Or well, it actual is able to, but you need to wait many minutes when seeking to 0:02:00 again, because it would read all segments in-between until it gets there.

But of course, it only gives info about the forward-seekable range. We do not have a back buffer currently (which would theoretically be possible ofc).

To better synchronize our views, let's look at an extreme (but planned) example: It will be posible to configure a timeshift buffer (on the server) of up to 20h. Such timeshift buffers can also be shared by multiple users or recordings or a mix. So it's possible at any time that you start playing a channel and you get a playlist comprising a 20h timespan. And that's the "seekable range" which needs to be known in order to properly display and control the timeline UI. Once for visually indicating the area withing a user can seek and also for constraining seek requests to the valid range because seeking outside of that range can cause player errors or hangs or delays in the best case.

We do not have a back buffer currently (which would theoretically be possible ofc).

A back buffer is useful in order to be able to quickly seek around in a near range, but it can't help with cases like above. It also applies to cases with more moderate numbers, I just took it for better illustration. i.e. we're not talking about a niche case.

lukasf commented 6 months ago

Thanks for the explanation, this makes things pretty clear. Obviously, buffered range and similar approaches cannot nearly cover that. I guess your best choice is to read and update the playlist files in parallel during playback, and use that information to update seekable ranges in the player.

softworkz commented 6 months ago

Thanks for the explanation, this makes things pretty clear. Obviously, buffered range and similar approaches cannot nearly cover that. I guess your best choice is to read and update the playlist files in parallel during playback, and use that information to update seekable ranges in the player.

Yes, I think you're right. I cannot estimate how much effort it would take to bring this into ffmpeg/MPV/ffmpeginteropx.

One thing that isn't quite clear to me is why the seekling can only happen within the buffer - why not to a point in the playlist which is beyond the buffer? I mean, it's also possible to seek around in an mkv file for example - without needing to read everything in-between, so why not within a live stream? The duration may not be constant, but for the singular moment at which a seek is done, it is known and can be considered constant in that moment for all calculations which need to be done to perform the seek...

lukasf commented 6 months ago

While using HLS live streams for time shifting is technically possible, it is not what a live stream is usually intended or used for. That's why you don't see much support for your scenario in players and libs.

In the case of playback in WinUI/UWP MediaPlayer, I also see some additional trouble: If you start playback at current (live) position, and then you seek back into time shift buffer, you would end up with a negative position (position must be normalized in MediaStreamSource). I don't think that this is supported, as it would screw PlaybackControls. So you'd need to kind of virtualize positions, and probably create a new source for every seek (at least for back seeks past your position 0). Or you always set the start position to be the oldest point in time shift buffer, but then it might happen that you cannot seek back to it, once it moves out of time shift range.

softworkz commented 6 months ago

While using HLS live streams for time shifting is technically possible, it is not what a live stream is usually intended or used for.

The opposite is true. It is an explicit goal of HLS live streams to allow seeking back to the past of the full range which the playlist covers. There's also a plalist-type of "EVENT", which is similar to live but older segments are not allowed to be removed. The purpose here is that when you start watching an event later, you can always seek back to view it from start. The HLS live streams of German public tv stations are covering quite a range of past content (3h in case of ZDF).

And that's exactly the way how HLS streams are supposed to work. You won't hear the word "timeshift", but technically, it's the same thing.

That's why you don't see much support for your scenario in players and libs.

VLC, ExoPlayer, HLS.js, Roku devices, Samsung TVs, LG TVs, Apple player,.. -just to name a few.

In the case of playback in WinUI/UWP MediaPlayer, I also see some additional trouble: If you start playback at current (live) position, and then you seek back into time shift buffer, you would end up with a negative position (position must be normalized in MediaStreamSource). I don't think that this is supported, as it would screw PlaybackControls.

When you start playback at the live edge, you have already loaded the playlist, so you know the first available segment, and of course you set this as zero (or work with absolute times for which you don't even need to load the first segment if the playlist has an appropriate tag).

Then you iterate through all segments in the playlist and sum-up the segment durations. This procedure is described in the HLS spec and the result is defined as the "playlist duration" - which can change each time when the playlist gets updated.

Then you add this duration to your start time to get the playback position at which the "live edge playback" is starting.

Or you always set the start position to be the oldest point in time shift buffer, but then it might happen that you cannot seek back to it, once it moves out of time shift range.

(sorry, read this only after writing the above)

Yes, in case that old segments are falling off, you cannot seek back to those positions anymore. That's what the "Seekable Ranges" indication is used for. The HTMLVideoElement spec is following the same pattern, for example.

softworkz commented 6 months ago

I wonder what would happen with FFmpegInteropX when setting the HLS demuxer option live_start_index to 0. The default is -3, which means start with the 3rd last segment. 0 means from the first segment in the playlist. What happens with MPV in that case is that when seeking forward, it reads every single segment in-between to complete the seek - which can take endlessly of course.

brabebhin commented 6 months ago

If you use an AdaptiveMediaSource, do you get the desired behavior?

softworkz commented 6 months ago

If you use an AdaptiveMediaSource, do you get the desired behavior?

That't a good question and it's one of the next things im gonna try. Curently, I'm into automatic switching of refresh rates and HDR mode, I'll check this out right once I'm done.

softworkz commented 6 months ago

If you use an AdaptiveMediaSource, do you get the desired behavior?

Yup, just tried it. With AdaptiveMediaSource and one of the TV streams from the link I posted, I can freely seek within the past three hours and PlaybackSession.GetSeekableRanges returns the appropriate range accordingly.

brabebhin commented 6 months ago

I guess this is a gap in our implementation or ffmpeg's. We could implement the byte stream handler, but you will lose subtitles support, unfortunately.

softworkz commented 6 months ago

Subtitle support (and maybe some audio codecs) is the only reason to prefer FFmpegInteropX over the plain Windows.Media implementation (because all HLS streams use codecs which are supported by Windows anyway.

Maybe it would be possible to:

brabebhin commented 6 months ago

Another option is to support subtitles through the MF API,but we would lose custom styling and custom fonts.

brabebhin commented 6 months ago

Actually now that i think of it, the subtitles are just text scripts, it shouldn't be too difficult to generate the strings after we modify it to fit the custom stuff. It would be like having our ass/ssa generator.

We also need to make the current sample providers async to avoid queuing packages until a subtitle is found. That could lead to memory leaks.

lukasf commented 6 months ago

I am not sure if the bytestream handler approach would work here. Actually bytestream handler is only for files. But there is something similar for URLs. But still, if you implement that, MF will pass you the URL and expect you to do everything. But then we are again limited by the ffmpeg hls/dash format handlers, which just does not provide any of the information needed for seekable ranges. So I don't think that this would solve any of the issues. The ffmpeg support is really only very basic, rather intended for transcoding and ripping. Any player who seriously wants to play dash/hls rolls their own stream parser, which makes starting of playback a whole lot faster and allows for seamless stream switching and stuff like that, and ofc also seekable ranges is not a problem then.

If we'd want to utilize the AdaptiveMediaSource, we'd need to go more downlevel. It might be possible to register as a mp4 demuxer codec, to let MF do the stream parsing and instead hook in more downstream on mp4 segment layer. Then we could demux the segments using FFmpeg APIs and do subtitle transcoding (any text format to ssa) and optionally also decoding of other formats.

But this seems like a whole new project with lots of work to do. I really don't have the capacities for that. I have done MF bytestream handlers in the past, it's pretty complicated stuff. Demuxer codecs are even more complicated, with multiple outputs and format negotiations on all the pins and stuff. The UWP MediaStreamSource was really a major improvement, I surely don't miss doing things down on MF level 😄

Also, I am not even sure if this approach would work. It could also be that the AdaptiveMediaSource does all the demuxing internally. In that case, the question would be if it just drops any unknown subtitle formats, or if it exposes them. In the latter case, it might be possible to register as a decoder codec and try to do subtitle transcoding from other text formats to ass.

So all in all, I see lots of work and lots of question marks...

brabebhin commented 6 months ago

I wonder how difficult this would be to implement a HLS Parser in our library.