ffmpeginteropx / FFmpegInteropX

FFmpeg decoding library for Windows 10 UWP and WinUI 3 Apps
Apache License 2.0
205 stars 52 forks source link

Duration does not Update with HLS Live Streams #411

Open softworkz opened 7 months ago

softworkz commented 7 months ago

There are so many duration properties...

                var d1 = this.FfmpegMss.FormatInfo.Duration;
                var d2 = this.FfmpegMss.Duration;
                var d3 = this.FfmpegMss.GetMediaStreamSource().Duration;
                var d4 = this.FfmpegMss.PlaybackItem.StartTime;
                var d5 = this.FfmpegMss.PlaybackItem.Source.Duration;
                var d6 = this.FfmpegMss.PlaybackItem.Source.MediaStreamSource.Duration;
                var d7 = this.FfmpegMss.PlaybackSession?.NaturalDuration;
                var d8 = this.FfmpegMss.PlaybackSession?.MediaPlayer.TimelineController.Duration;

Unfortunately all are zero when using an HLS live stream (where the duration is continuously expanding). Do you have any idea how to get accurate duration values in this case?

I haven't tried the Windows.Media adaptive streaming source, but then I would loose all FFmpegInteropX functionality, right?

lukasf commented 6 months ago

Since this only really affects live streams and recorded content from live streams: Why don't you just use the AdaptiveMediaSource for HLS/DASH and FFmpegInteropX for everything else? I think live streams only use standard codecs, for audio video and subtitles. You won't find any weird formats there, so I wonder how much benefit you would have from using our lib in there.

softworkz commented 6 months ago

Since this only really affects live streams and recorded content from live streams: Why don't you just use the AdaptiveMediaSource for HLS/DASH and FFmpegInteropX for everything else? I think live streams only use standard codecs, for audio video and subtitles. You won't find any weird formats there, so I wonder how much benefit you would have from using our lib in there.

We're using HLS not only for TV streams. It's also our primary streaming format for everything which gets transcoded. AdaptiveMediaSource doesn't support all the subtitle codecs and converting ASS to VTT loses all the formatting for example. And we would need to burn-in all the graphic subtitles. Also, I'm not sure about audio formats support, but I think it's not that bad as with subtitles.

Do you know whether AdaptiveMediaSource supports WebVTT subtitles?

And second question: Could FFmpegInteropX be used as a MediaSource for a subtitle stream alone?

brabebhin commented 6 months ago

And second question: Could FFmpegInteropX be used as a MediaSource for a subtitle stream alone?

Yes and no.

When we implemented subtitles, the TimedMetadataStreamDescriptor and its associated functionality was not yet exposed in winRT. So we implemented subtitles by exposing them in a TimedMetadataTrack. This basically works, but it means we don't fully support the MediaStreamSource contract, we bypass that with the subtitles, so this means our subtitles are not exposed as streams in IMFMediaSource underlying interface that's used by various things inside MF, including AdaptiveMediaSource, transcoding, bytestream handlers etc. As it turns out, this is quite complicated to undo, although I would like to eventually do it, maybe once the current batch of PRs is merged.

We do support parsing a TimedMetadataTrack with ffmpeg APIs, but that is currently only exposed to work with the MediaPlaybackItem that's created by the FFmpegMediaSource itself.

But it shouldn't take too long to support parsing random files and return a bare TimedMetadatTrack you can assign to random MediaPlaybackItems, including those created by an AdaptiveMediaSource.

softworkz commented 6 months ago

If we'd want to utilize the AdaptiveMediaSource, we'd need to go more downlevel. It might be possible to register as a mp4 demuxer codec, to let MF do the stream parsing and instead hook in more downstream on mp4 segment layer. Then we could demux the segments using FFmpeg APIs and do subtitle transcoding (any text format to ssa) and optionally also decoding of other formats.

But this seems like a whole new project with lots of work to do. I really don't have the capacities for that. I have done MF bytestream handlers in the past, it's pretty complicated stuff. Demuxer codecs are even more complicated, with multiple outputs and format negotiations on all the pins and stuff. The UWP MediaStreamSource was really a major improvement, I surely don't miss doing things down on MF level 😄

Also, I am not even sure if this approach would work. It could also be that the AdaptiveMediaSource does all the demuxing internally. In that case, the question would be if it just drops any unknown subtitle formats, or if it exposes them. In the latter case, it might be possible to register as a decoder codec and try to do subtitle transcoding from other text formats to ass.

So all in all, I see lots of work and lots of question marks...

Thanks a lot for your thoughts. I see it pretty much in the same way. We're using MPEGTS in all cases, no MP4, but this doesn't really change anything. It's a complex task, not matter at which point you start to plumb something in. Even when re-using existing partts of code, there are still many caveats which would come alone from the fact that the Windows.Media API are a different thing and even though there are some low-level extension points, it doesn't provide much flexibility there. It only provisions for very specific use cases where you need to do things exactly in the intended way - otherwise it fails.

What comes on top, is that we need a mature solution - not some kind of proof-of-concept, and there's also long way from one to the other. Just thinking about dealing with gaps, with a/v stream offsets or start-time offsets, discontinuities and the likes. It's too much for me to take either...

But it shouldn't take too long to support parsing random files and return a bare TimedMetadatTrack you can assign to random MediaPlaybackItems, including those created by an AdaptiveMediaSource.

That sounds as it could be the way to go. The ability to serve as a provider for external subtitle streams exists already, so I can't imagine that it would be a huige task to make it possible for using together with AdaptiveMediaSource. It would be about external subtitles only - maybe WebVTT from the same master playlist if that's possible, but I hope that AdaptiveMediaSource can do VTT itself anyway. Do you know whether it can do it?

brabebhin commented 6 months ago

That sounds as it could be the way to go. The ability to serve as a provider for external subtitle streams exists already, so I can't imagine that it would be a huige task to make it possible for using together with AdaptiveMediaSource.

True, just some cppwinrt shenanigans. But I now realize we don't actually support doing this external subtitle from an URI. We only support stream based subtitles. Again, nothing to write home about, it will just takes me a little while longer to do it.

The stream based stuff is already done, you can check branch external-subtitle-parser

softworkz commented 6 months ago

But I now realize we don't actually support doing this external subtitle from an URI. We only support stream based subtitles. Again, nothing to write home about, it will just takes me a little while longer to do it.

Doesn't need any futher action. I'm using external subs from URIs ever since. It's been one of the first things I did and it's working fine:

    var uri = new Uri(url);

    var streamRef = RandomAccessStreamReference.CreateFromUri(uri);
    var ras = await streamRef.OpenReadAsync();

    var subtitleStreamInfo = await ffmpegMss.AddExternalSubtitleAsync(ras, stream.DeliveryUrl);

I think we even talked about it earlier.

softworkz commented 6 months ago

I (accidentally) found something interesting: https://github.com/SuRGeoNix/Flyleaf

It's something similar (but still different) as FFmpegInteropX and they claim:

image

Haven't looked at the code yet..

brabebhin commented 6 months ago

That seems to be using a custom ffmpeg build.

softworkz commented 6 months ago

Yes, but the referenced patchset is pretty minimal: https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=1018

brabebhin commented 6 months ago

Yeah i noticed that, i wonder why this is not merged in ffmpeg proper already.

lukasf commented 6 months ago

There are loads of great patches for ffmpeg which have never been merged, lost and forgotten in this stupid mailing list system. It's really a tragedy.

Flyleaf itself seems to be a pure managed implementation. Microsoft strongly advises against using any managed code inside media engines. I have seen those GC hiccups myself. Better stay 100% native from source to renderer if you want glitch-free animations/playback.

brabebhin commented 6 months ago

The question is, do we want to have a custom ffmpeg build?

softworkz commented 6 months ago

There are loads of great patches for ffmpeg which have never been merged, lost and forgotten in this stupid mailing list system. It's really a tragedy.

Very true and very sad indeed!

softworkz commented 6 months ago

Flyleaf itself seems to be a pure managed implementation. Microsoft strongly advises against using any managed code inside media engines. I have seen those GC hiccups myself. Better stay 100% native from source to renderer if you want glitch-free animations/playback.

I think those GC collection issues are rather a thing of the past with recent .net versions. Also, you can get pretty close to C languages in algorithm performance, but unly when using C# unsafe blocks with pointer arithmetic - but still only close, not equal. The biggest gap is memory management and p/invoking to do so arkward. There's new stuff for this coming though (System.Runtime.InteropServices.NativeMemory). Haven't checked whether it's in net8.0.

But despite all those possibilities, I think it's more challenging to do it properly from C# than implementing it with the language the API was made for.

softworkz commented 6 months ago

The question is, do we want to have a custom ffmpeg build?

I think the first part is to understand how they are doing it. I cannot imagine that these small patches alone can do the trick.

If it turns out to be feasible in some way, then I'm pretty sure that I'll be able to get it merged into ffmpeg.

brabebhin commented 6 months ago

I think those GC collection issues are rather a thing of the past with recent .net versions. Also, you can get pretty close to C languages in algorithm performance, but unly when using C# unsafe blocks with pointer arithmetic - but still only close, not equal. The biggest gap is memory management and p/invoking to do so arkward. There's new stuff for this coming though (System.Runtime.InteropServices.NativeMemory). Haven't checked whether it's in net8.0.

But despite all those possibilities, I think it's more challenging to do it properly from C# than implementing it with the language the API was made for.

The main advantage of using unmanaged code is that the data resides in another area on memory. I agree with you that simply using unsafe pointers is likely going to yield C-like performance. But if and only if your entire application does that. Because if you use the usual C# objects in any area of the app, you will eventually cause a GC freeze. And one place where this easily happens in media playback is the seekbar, which will produce garbage strings around the clock.

If you could separate the managed media code from the rest of the managed app and only have the GC freeze the rest of the app, then yes, you could write a gc-freeze-free media app in C#.

brabebhin commented 6 months ago

I think the first part is to understand how they are doing it. I cannot imagine that these small patches alone can do the trick.

If it turns out to be feasible in some way, then I'm pretty sure that I'll be able to get it merged into ffmpeg.

I can clone that and have a look. But it wouldn't be surprising for those 2 little patches to fix everything, maybe it really is that small of a deal.

lukasf commented 6 months ago

I think those GC collection issues are rather a thing of the past with recent .net versions.

While part of the GC can happen in the background, it's still the case that the GC can halt your thread at any point while executing managed code, to perform bigger GC operations. So while you just wanted to submit the latest frame to the renderer, the GC can pause your thread, resulting in your frame being displayed one refresh interval too late. This kind of occasional glitches can only be prevented by pure native code.

I am aware that performance wise, the latest .net core versions have considerably narrowed down the performance gap to native code. Especially the whole ref-like thing (Span, etc) was a major breakthrough in performance.

lukasf commented 6 months ago

The main issue is that ffmpeg does not set any duration for live HLS streams. Neither on AVStream nor on AVFormatContext. Also during playback, durations are never set/updated. So to properly support this with ffmpeg, a patch would be needed, to (as an option) set durations for HLS also in case of live streams (use last available segment end time), and update the duration as new segments become available.

I don't think that seeking in live HLS streams is a big issue, because it already works without issues in non-live HLS streams. Makes sense to me, that a small patch is sufficient, if needed at all.

softworkz commented 6 months ago

a patch would be needed, to (as an option) set durations for HLS also in case of live streams (use last available segment end time), and update the duration as new segments become available.

The last available segment doesn't have an end time, though.

What you need to do is to:

I had taken a quick look at the indicated patches and they don't appear to do anything like that...

lukasf commented 6 months ago

Oh yeah, I forgot about the start time. There is not even a field available in ffmpeg where this could be stored. That makes it even more difficult to bring this into ffmpeg.

The patch probably only solves some minor issue when performing a seek operation into live streams. But for all the rest, I guess they manually parse the stream to get the required info. There is just nothing there in ffmpeg.

softworkz commented 6 months ago

Yup, it would require some extensions to the HlsDemuxer or reading/consuming the playlist in parallel. That's why I said that this little patch they are referencing can't be the key (alone). But I haven't followed their own code yet to see what they are really doing.