ffmpeginteropx / FFmpegInteropX

FFmpeg decoding library for Windows 10 UWP and WinUI 3 Apps
Apache License 2.0
211 stars 53 forks source link

Subtitle support through UWP APIs #135

Open lukasf opened 5 years ago

lukasf commented 5 years ago

I noticed a while ago that the latest Insider SDK now contains methods to create subtitle stream descriptors. Search for "TimedMetadataEncodingProperties" here, to see what I mean.

This means that with Windows vNext, it would be possible to use "passthrough" for subtitles as well, which would allow us to much more cleanly deliver subtitles to the platform, without all our anti-flicker stuff and all the custom format parsing.

Now to my surprise, it seems that this is even possible in earlier Windows versions, probably starting with 1803. Look at this PR, where MF Guids are used to set the right formats. Pretty interesting to see that this works, nice work @brbeec.

This allows to do a simple "passthrough" for formats supported by UWP, as done in the PR. But that is limited to only the 4 formats supported by UWP. However, ffmpeg does decode all supported text formats into SSA. We could theoretically do a hybrid approach, decoding all text formats to SSA through ffmpeg and then forward that SSA data to UWP, letting UWP do all the format parsing and rendering, without the need of anti-flicker tricks. As always with passthough, we would lose some functionality (e.g. style overrides, and probably also embedded fonts would not work), but it also has its benefits.

I probably won't have time for this anytime soon, and I don't even know if it's worth it, since our code seems to work quite well by now. Still I thought I'd mention it. It would be interesting to see if this works. Maybe I have more time next year, to play a bit with this stuff.

brabebhin commented 5 years ago

Hmm interesting find. Kind of refreshing to see the Microsoft fork alive and kicking haha.

Now, on topic: we would also lose subtitle delays, would we not?

How would the image based subs work with this?

I don't think this is worth doing right now just for the sake of doing it. If we find this very nasty situation in which our current approach does not work, we could take a try at it. Otherwise, i don't think it is necessary. But i think our approach is mature enough by now.

lukasf commented 5 years ago

Now I think there is even a huge problem in how that PR is implemented: The MediaStreamSource still only has this "pull" model where the MSS asks for the next sample of each stream. Now if the MSS requests the next subtitle sample, but the file only has few subs and the next sample is at the end of the file, then the lib would read all packets until file end, and enqueue them all in the sample providers. Boom, hundreds of MB or even GBs of memory used, possible "out of memory" crash. Very bad.

If MSS was to properly support subtitles, it would need a "push" model where samples are actively enqueued as they occur in the stream. The current pull model only works properly for continuous streams such as video or audio, but not for subtitles or metadata streams.

brbeec commented 5 years ago

To clarify: We added the TimedMetadataStreamDescriptor in 1803, but we didn't do the work to expose those streams to a MediaPlaybackItem. In other words if a MediaPlaybackItem was initialized from a MSS, we wouldn't create TimedMetadataTracks on the MediaPlaybackItem for those streams pre-20H1. In 20H1 I added support for SRT, SSA, PGS, and VobSub subtitle streams so that now when you initialize a MediaPlaybackItem from an MSS, we will create TimedMetadataTracks for those streams.

You're correct about the perf issue with PR #263. I'm aware of it. I just needed to get the functionality checked in and I'll address the perf issue using a MediaStreamSourceSampleRequestDeferral in subsequent changes.

lukasf commented 5 years ago

Thanks for the clarification, so we'll have to wait for 20H1 to actually use this.

I never even thought about using a deferral here. That could be a clever way to solve the problem of subtitle streams with pull model (altough it might be a bit tricky to implement). :+1:

brabebhin commented 5 months ago

I've had another look at this in light of getting rid of the winUI DispatcherQueue. The primary limitation is not being able to add additional streams to a media stream source once a media playback item has been created, this means we cannot add external subs using the MediaStreamSource, we have to keep the TimedMetadataTrack approach. This means we need to maintain both implementations.

This isn't much different than the compressed <> uncompressed sample providers we currently have. The current implementation would be the uncompressed provider, and the winRT MediaFoundatiion API would be the compressed provider.

This doesn't fix the dependency on the UI framework, so for this we could use the UWP API (compressed) for embedded subs, and the ffmpeg decoder (current implementation) for external subs.

  1. Embedded subs go through the MediaStreamSource as passthrough.
  2. External subs go through the current FFmpeg parser.

Another limitation of our current approach is the use of the MediaStreamSource in transcoding scenarios. Subtitles will not go through, but if we implement transcoding through ffmpeg API this will not matter.

lukasf commented 5 months ago

There is a difference to other UncompressesSampleProviders: Subs need to be provided async using deferral. It shouldn't be such a big thing, but the packet handling will be somewhat different.

We could also use this with external subtitle files. It's just that they'd have to be added before the MediaStreamSource/MediaPlaybackItem is created. Once, the MSS is created, we'd need to use our custom renderer for further added subtitle tracks.

brabebhin commented 5 months ago

The async stuff shouldn't be hard to deal with. I've been quite busy in the past week and probably will be in the coming weeks as well.

If you want, I could start slowly working on this over time and probably have a working model ready in a few months. I think it is good to get rid of the dispatcher queue business. This will totally separate us from the UI framework.

If we want to use external subs in the mss, we need to support multiple input files first.