Subtitle Configuration: FontFamily (UWP)

softworkz commented 1 year ago

Is this working on your side?

Initially, I wanted to get it working with a custom font supplied as appx resource, but then I realized, that it's not even working when setting a system-installed font:

FfmpegMss.Configuration.Subtitles.SubtitleStyle.FontFamily = "Impact";

At some point I got really desperate and tried this:

    for (uint i = 0; i < currentPlaybackItem.TimedMetadataTracks.Count; i++)
    {
        var track = currentPlaybackItem.TimedMetadataTracks[(int)i];

        foreach (var mediaCue in track.Cues)
        {
            var cue = (TimedTextCue)mediaCue;
            cue.CueStyle.FontFamily = "Impact";

            foreach (var textLine in cue.Lines)
            {
                foreach (var subformat in textLine.Subformats)
                {
                    subformat.SubformatStyle.FontFamily = "Impact";
                }
            }
        }
    }

But no chance to change the font. Everything else is working: Font size, font bold, foreground color, etc.

Is it working on your side?

brabebhin commented 1 year ago

Internally we use TimedMetadataTrack and its related APIs to handle subtitles. There are multiple reasons we do so, you can dig up the PR for subs support from 5 years ago.

This means that winRT has some degree of authority over how subtitles are presented. I alway wondered how this font thing would work taking windows settings into account. It may look like your windows settings are overwriting the custom settings.

I remember this used to work at some point, I will look into it again.

softworkz commented 1 year ago

you can dig up the PR for subs support from 5 years ago.

I remember this used to work at some point,

I had already looked at the code, mainly to find out how you are setting custom fonts (which are not installed on the system), and yes, that made me think the same: It must have worked at some time.

To clarify: I don't mean to say that it's a problem of FFmpegInteropX. I get the same results when adding the subtitle track via Windows.Media methods directly.

So, what I'm rather wondering: is it only me where it's not working (maybe due to some system configuration or certain UWP project/appx settings) - or is it a general problem (maybe with recent SDKs)?

brabebhin commented 1 year ago

I haven't seen major differences in the way APIs behave between SDK versions. I think this is more of a Windows problem. It may also be that the font family property is meant for the use case in which the application renders the subtitles, not when it's done thought the MPE. It was working at some point as in actually a bug in Windows that got fixed.

I would love to have the subtitles rendered fully within ffmpeginteropx, but we are missing libass build for that. We could get rid completely of MPE after that by rendering things in frame server mode, which provides more flexibility and performance.

softworkz commented 1 year ago

I think this is more of a Windows problem.

Yup, very well possible.

It may also be that the font family property is meant for the use case in which the application renders the subtitles, not when it's done thought the MPE.

But all other text style options are working properly..

Can you confirm that it's the same on your side, i.e. that you can set the font to Bold but it ignores the FontFamily setting?

softworkz commented 1 year ago

I would love to have the subtitles rendered fully within ffmpeginteropx, but we are missing libass build for that. We could get rid completely of MPE after that by rendering things in frame server mode, which provides more flexibility and performance.

How exactly would you want to do that? For proper libass rendering, you need to generate overlay frames at the same fps (or at least half) and I don't think this would work using ImageCue items. Blending the frames together is a too expensive operation I suppose, so I would see only the following two ways:

Let libass operate on the D3D surfaces directly, so it paints the subs right onto the video frames
Let libass paint on transparent frames, convert and upload them as D3D surfaces and then...? I'm unsure how the next part is working, haven't looked it up: I thought you would supply the D3D surfaces to the MediaPlayer as video frames - but could you supply two D3D surfaces where one is overlaid on the other?

brabebhin commented 1 year ago

One of the kinks of this project is that we don't really have it all figured out. Until we link libass and see what it does, we don't really know how it will go.

softworkz commented 1 year ago

I know what libass does. But I don't know how you are supplying video frames to MediaPlayer :-)

brabebhin commented 1 year ago

There's multiple ways, depends on the video decoder mode.

We have directx based decoding, and we can acces the texture containing video data.

There's the pass through, which is basically the system doing the decoding - total black box.

Software decoding inside ffmpeg is pretty self explanatory.

We could render subtitles to a directx image and expose the subtitle streams to MediaPlaybackItem the same way we do for audio and video, and we would feed back raw images.

Or we could overlay the subtitles using a ffmpeg filter. Multiple ways really, but until we have libass linked in and get a gist of it, it's pretty difficult to say which way to go.

This will also entitle a complete rewrite of the subtitles engine, which may or may not be productive in the long run.

softworkz commented 1 year ago

My subtitles PR for ffmpeg (https://github.com/ffstaging/FFmpeg/pull/18) includes a new text2graphicsub filter which outputs the rendered subtitles as video frames. We use it for subtitle-burn-in at the server side. You can use the output with the ffmpeg overlay filter to do software overlay, or you can also use the hwupload filter and do hardware overlay. But there's nothing like an "overlay_d3d" filter, only those specific to a certain hw context, like overlay_cuda and overlay_qsv with which it is working fine, but it requires specific handling and filter setup depending on the hw context.

What I meant above (re: two d3d surfaces) would be avoiding any burn-in processing and have the overlay done "on-screen" by showing the two surfaces, with the subtitles-layer on top.

brabebhin commented 1 year ago

The advantage of d3d is that it fully abstracts hardware. We prefer that over closed techs like cuda.

The elephant in the room is the pass through decoder, which is still needed for xbox and doesn't allow any kind of frame manipulation.

softworkz commented 1 year ago

The advantage of d3d is that it fully abstracts hardware. We prefer that over closed techs like cuda.

Both are closed (in the sense of not being open-source), but CUDA and Intel frameworks are cross-platform, Intel partially open-source. But the primary point is that ffmpeg doesn't provide D3D11 filters but there are many hw filters for Nvidia and Intel, which makes implementing something in that area pretty convenient.

But a pure D3D based implementation is still a serious option to consider of course.

The elephant in the room is the pass through decoder, which is still needed for xbox and doesn't allow any kind of frame manipulation.

So it's not possible to work with D3D surfaces on xbox?

It still does hw accelerated playback, though, no?

brabebhin commented 1 year ago

Writing code that supports only a subset of available hardware, no matter how big that is, is not convenient. This is why we are avoiding CUDA and Intel frameworks.

I don't know how xbox does it, never debugged it, some people reported that only pass through works on xbox with acceptable performance. I suppose there's some differences in how the d3d pointer that we use for directx decoding is handled by mf on xbox.

lukasf commented 1 year ago

If we'd include libass, my plan would be to output them just as regular image subtitles. I don't see a reason why this should not work. Libass renders text subs to bitmaps, we turn them into SoftwareBitmaps and add them to the ImageCues. MediaPlayer renders them on top of the video image using HW acceleration.

Using burn in subtitles has many disatvantages. It is limited to the video's resolution, so especially if you have a lower res video, burn in subs look blurry. It is also slow if you do it in software. Since ffmpeg filters do not support dx11, using its filters would automatically mean doing it slow in CPU memory. Plus, when using GPU video decoding, it would mean copying decoded video frames from GPU->CPU copy, then doing sub rendering in SW, then copy back from CPU to GPU for rendering. That's a heavy performance penalty.

softworkz commented 1 year ago

If we'd include libass, my plan would be to output them just as regular image subtitles. I don't see a reason why this should not work

Proper ass rendering means to provide images at framerate similar to the video (or half the fps at minimum) and the ImageCue feature is not made for that,

ASS subtitle animations are a crucial feature, especially in the area of Anime.

Here are some examples in case you're not familiar:

Using burn in subtitles has many disatvantages. It is limited to the video's resolution, so especially if you have a lower res video, burn in subs look blurry.

Correct - it's always the last resort option.

It is also slow if you do it in software. Since ffmpeg filters do not support dx11, using its filters would automatically mean doing it slow in CPU memory. Plus, when using GPU video decoding, it would mean copying decoded video frames from GPU->CPU copy, then doing sub rendering in SW, then copy back from CPU to GPU for rendering. That's a heavy performance penalty.

Yup, that's why I've created the textszub2video filter for uploading subtitle frames into a hw context, so the overlay can be done in hw.

Since ffmpeg filters do not support dx11

The QSV filters do - but only with Intel hw...

softworkz commented 1 year ago

Then there's OpenCL and Vulkan, for both exist filters in ffmpeg. You can derive an OpenCL hw context from qsv and cuda contexts. It doesn't help much in case of Nvidia though, because you still can't get D3D surfaces. AMD is a bit late to the game, but I'm currently in contact with them as they are about to introduce an AMF hw context to ffmpeg including a set of filters. On Windows, their backend API is D3D, so for AMD it will probably work as well.

brabebhin commented 1 year ago

We can't really ignore AMD. As long as they provide iGPU, their hardware is to be supported.

lukasf commented 1 year ago

Let's not forget that this lib is a playback lib, and we are rendering to a MediaPlayerElement which uses D3D11 tech. I don't think that any proprietary cuda or whatever could help us here (even if we'd ignore the AMD question). We need the images in D3D11.

It could be that ImageCue is not fast enough for animations. Frame rate does not really have to be in sync with video, if you do not burn in. But frame times need to be more or less stable and frame rate not too low. I'd think it's worth a try. But there does not seem to be any progess on the meson build PR at libass, unfortunately. So it is difficult to integrate it into our build system.

If ImageCues really do not work, then things would become really messy. Either we'd have burn in, with all its downsides. Or we'd have to come up with a completely custom subtitle renderer, fully synced up with the actual video. I don't like that idea. Also because it's not UI agnostic, so a UWP renderer would be needed, as well as a WinUI renderer. Well there would even be a third option: If your filter can render ass subs into a video stream, then we could expose that stream as a separate MediaStreamSource. The subtitle could be rendered to a MediaPlayerElement right above the "normal" MediaPlayerElement, synced up using MediaTimelineController. But is's also quite a complicated hack. This all sounds pretty bad to me.

brabebhin commented 1 year ago

Technically we could create our own MPE with frame server mode, which would be reasonable reusable between uwp and winUI, just different namespace imports. But we would still relay on directx. I don't think cuda/whatever would be different than software decoder on how they would fit in the decoder pipeline, they all eventually end up in a directx surface. I don't think ignoring directx on windows is a good idea. I can see why intel, nvidia and amd (i wonder when qualcomm will too) would try to push their own tech stacks, but i don't think we have the man power to deal with that arms race. Directx makes thinga simple, portable across hardware, performance is more than enough.

softworkz commented 1 year ago

We need the images in D3D11. Let's not forget that this lib is a playback lib

Correct. Only with QSV it would work. But as I have spent significant time on this subject over the past years (we have probably the best-in-class automatic generation of ffmpeg video processing pipelines for realtime transcoding), I would agree that this is out-of-scope for this library as it really takes a lot of time for getting this all working, even though subtitles are just one part of the story.

It could be that ImageCue is not fast enough for animations. Frame rate does not really have to be in sync with video,

It's not that much about being fast and being in sync - it's about being contiguous, i.e. show one subtitle image after the other, but without gaps (=> flickering) and without overlap (=> tearing).

So it is difficult to integrate it into our build system

Why that? I have it in a Visual Studio solution as a VC project and for production, we're building with gcc on MSYS2..

If ImageCues really do not work, then things would become really messy. Either we'd have burn in, with all its downsides. Or we'd have to come up with a completely custom subtitle renderer, fully synced up with the actual video. I don't like that idea. Also because it's not UI agnostic, so a UWP renderer would be needed, as well as a WinUI renderer.

I wouldn't like that either. The only two good things I can note are:

libass can tell you whether there's a change in frame content without rendering it. This means that processing can be done very adaptive, and you need to do processing at high fps only when there's really animated content to be presented
There's another filter for outputting graphical subtitles. It works by detecting regions with content, so that the output can be just one or few bitmaps at a time, like required for those formats. Such algorithms could be used to avoid dealing with large frames. The downside though, is that it also gets more complicated when you have to deal with multiple (possibly animated) D3D surfaces.

The subtitle could be rendered to a MediaPlayerElement right above the "normal" MediaPlayerElement, synced up using MediaTimelineController.

Do those elements even support transparency?

softworkz commented 1 year ago

I don't think cuda/whatever would be different than software decoder on how they would fit in the decoder pipeline,

It would be different because the nvidia decoders can output frames directly into a CUDA context.

I don't think ignoring directx on windows is a good idea. I can see why intel, nvidia and amd (i wonder when qualcomm will too) would try to push their own tech stacks,

I think a main reason is platform independence. All of them are supporting interoperability with OpenCL and Vulkan, so it's not really about fencing strategies (well - not in this case at least). And I think Intel did it best, because their QSV stack is platform independent, but built on D3D for Windows and VAAPI for Linux. AMD are about to follow a similar path.

So, it's all not that bad from the side of the industry, but all this doesn't help us for that specific problem. I rather see the shortcoming here at the side of MS, because they created all the decoder APIs for D3D11 (and DXVA2 previously), but they didn't do much in the area of hardware image processing. (some APIs were added for D3D11, but ffmpeg doesn't support them).

brabebhin commented 1 year ago

This library is windows only and will always be, so portability in this case is all about hardware support. I've dealt enough with hardware specific GPU APIs to know i don't want to deal with it again soon.

If an API can provide directx levels of hardware portability and performance with better effect support then that would be worth pursuing. As a replacement for d3d11.

softworkz commented 1 year ago

Yes, we're bound to D3D11 here, that's probably out of question.

I have some more leads that might be helpful (or not at all):

Checking out how MPV is integrating libass MPV is based on ffmpeg as well, and while they are not using D3D, they are dealing with other hw contexts, so it might be interesting to take a look at how they are integrating libass rendering into their pipelines
Taking a closer look at MediaFoundation I'm quite familiar with DirectShow, but I never looked into MF, so I don't know whether MF might have something to offer what could be useful for this task
Game development libs For the case when going down the route of doing custom rendering, there might be some useful libs which can take care of the heavy lifting with regards to managing and updating D3D surfaces

brabebhin commented 1 year ago

Just for the record, this is solely a philosophical conversation.

MF can handle the ass just fine. MPE and MediaPlayer are based on MF interfaces that are easily accessible to everyone in c++. The trouble here is that using those interfaces directly is complicated, it involves essentially rewriting MPE, and it does not guarantee it will actually fix the problem of subtitle fonts. I had a look at this some time ago. It is theoretically doable. It would be great to have our own MPE. But it's a lot of work. Possibly for nothing.

Theoretically we can implement subtitle rendering using win2D. At least for the non-animation stuff this should be pretty easy. I actually tried that at some point and had sub/srt going quite well.

brabebhin commented 1 year ago

Can you confirm that it's the same on your side, i.e. that you can set the font to Bold but it ignores the FontFamily setting?

This is indeed the case. Sorry for the late response. I think accessibility settings override this: font is something that you can explicitly set in windows settings, Bold is not.

softworkz commented 1 year ago

I think accessibility settings override this

Yup, I've found the same. On WinUI it's even worse: The background boxes are shown but the text is completely invisible. You can only make it visible by applying a text effect in Windows settings. What a mess..

brabebhin commented 1 year ago

On the other hand it means they are doing things with subtitles, and might also fix the frame server mode bug.

softworkz commented 1 year ago

On the other hand it means they are doing things with subtitles,

Doing? Maybe...but for the worse, not for the better. On UWP they are visible at least... 😆

softworkz commented 1 year ago

I'm sure there's some mechanism to deactivate the influence of accessibility settings on this. But it might once again take hours or days to find out.

brabebhin commented 1 year ago

I don't believe it is worth looking into it. This is clearly a bug. And one that occurred recently. It was working fine a few weeks ago.

The accessibility settings shouldn't produce invisible subtitles lol. That's kind of the point of accessibility.

I'm sure there's some mechanism to deactivate the influence of accessibility settings on this. But it might once again take hours or days to find out.

You would probably have to dig into the media foundation interfaces that these classes wrap. I had them mapped at some point. They are almost a 1 for 1 map from media foundation to the Windows.Media namespace.

softworkz commented 1 year ago

The accessibility settings shouldn't produce invisible subtitles lol. That's kind of the point of accessibility.

Yes, I'm sure they will fix it, but what I want is that the accessibility setting don't have any influence on our app, because it has its own subtitle settings configuration and it's hard to sell that some parts need to be configured here and other parts there.

brabebhin commented 1 year ago

I think that it is supposed to work with presentation kind "ApplicationPresented" and not "PlatformPresented". But you have to render the subtitles yourself. Not sure how. I don't think there's anything built in.

https://learn.microsoft.com/en-us/uwp/api/windows.media.playback.timedmetadatatrackpresentationmode?view=winrt-22621

softworkz commented 1 year ago

I think that it is supposed to work with presentation kind "ApplicationPresented" and not "PlatformPresented"

But wait a second - if that were true, what would be the purpose of all the style settings (TimedTextStyle iirc)? ...which are actually working - partially at least.

softworkz commented 1 year ago

You would probably have to dig into the media foundation interfaces that these classes wrap. I had them mapped at some point. They are almost a 1 for 1 map from media foundation to the Windows.Media namespace.

Normally, these things work in the way that the control panel stores the settings in the Registry and then a WM_SETTINGSCHANGE window message is broadcast to the top-level window of all processes (which have a message pump).

So I'd use procmon to find the registry change. This should give a few keywords to search for on GitHub or elsewhere for documentation or reported issues. If that doesn't help, the registry values could maybe removed or changed while the app is running and restored on close.

To find out whether it can be disabled at the Window.Media side, I'd do a text search to find the responsible dll, disassemble it and try to find adjacent settings keys which sound like they could disable it or look for conditions under which those values from control panel get applied (or rather not applied. When it's a small dll, then it can even be fun, but the larger the more unpleasant it gets, so it's always the last option... ;-)

softworkz commented 1 year ago

So I'd use procmon to find the registry change.

Better idea: Since you can save your settings under a name, you can just enter an unusual string and search for it in the registry.

softworkz commented 1 year ago

That went quickly: Computer\HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\ClosedCaptioning

softworkz commented 1 year ago

Here's a non-programmatic solution. When you merge this REG file, it will add a captioning theme with empty configuration, named "Controlled by Application" and sets it as active.

After that, all subtitle styles are controlled by the application again, including font family.

WindowsCaptionsDefaultProfile.zip

brabebhin commented 1 year ago

To find out whether it can be disabled at the Window.Media side, I'd do a text search to find the responsible dll, disassemble it and try to find adjacent settings keys which sound like they could disable it or look for conditions under which those values from control panel get applied (or rather not applied. When it's a small dll, then it can even be fun, but the larger the more unpleasant it gets, so it's always the last option... ;-)

Actually I think it is at MPE level. Because otherwise you would get the same result in both UWP and winUI, since MediaPlayer in Windows.Media is the same for both.

softworkz commented 1 year ago

Actually I think it is at MPE level.

Yes, I'm sure you're right about that due to some things I read. I haven't come to dive into it yet.

softworkz commented 1 year ago

It's a much bigger mess than expected. A really big one.. See my report here: https://github.com/microsoft/microsoft-ui-xaml/issues/9126

I found a workaround, but it works with WinUI3 only, not with UWP.

softworkz commented 1 year ago

(this is not the workaround that works on WinUI3)

I'd do a text search to find the responsible dll, disassemble it and try to find adjacent settings keys which sound like they could disable it or look for conditions under which those values from control panel get applied (or rather not applied.

The strategy turned out perfectly (almost). I was able to find an internal override mechanism using two additional registry keys: GlobalOverride and RedirectionKey.

Unfortunately it doesn't hep, because it's for globally overriding the settings system-wide.

brabebhin commented 1 year ago

I wonder how the developers of Media Player didn't notice this. Are they not using MediaPlayerElement? That would mean there's another way to render subs.

softworkz commented 1 year ago

That's a good question, probably they did notice, but in a big company it's often difficult to voice up. It can be sufficient when there's a single person being ignorant to the issue to silence such concerns.

The previous Windows Media Player was a pure Win32 application using DirectShow for media handling. It had its own custom UI framework (remember the skinning feature...). I don't know what the new Media Player is using, but I doubt that it would be using WinUI in any way. As team @MS, you want to minimize dependencies on any other team to avoid getting blocked or limited (or broken) by them. So I suspect they are working at the lowest possible level, probably MediaFoundation without even using MediaPlayer from Windows.Media.dll.

brabebhin commented 1 year ago

So I suspect they are working at the lowest possible level, probably MediaFoundation without even using MediaPlayer from Windows.Media.dll.

Possibly. IIRC they released the Media Player app before the MediaPlayerElement was available in winUI.

Anyways, playing around I noticed that the method RenderSubtitlesToSurface in frame server mode no longer throws an exception if the subtitle presentation mode is set to be ApplicationPresented. Now the problem is, the SubtitleFrameChanged event never triggers, despite having an active subtitle lol, so no subtitles ever get drawn. I guess this is progress.

If I can figure this out, we should technically have an alternative way of rendering subtitles without ever touching MPE. Assuming the RenderSubtitlesToSurface method also isn't bugged.

What the actual lolz, if I disable frame server mode I get the SubtitleFrameChanged in MediaPlayer. What is going on here >.<

softworkz commented 1 year ago

So I suspect they are working at the lowest possible level, probably MediaFoundation without even using MediaPlayer from Windows.Media.dll.

Possibly. IIRC they released the Media Player app before the MediaPlayerElement was available in winUI.

Oh well - my suspicion was wrong (I think for the first time since writing in this repo 😉): It actually does use WinUI2 and the Windows.Media features.

But I also figured out the reason why:

It's not made by the core media team. It originates from Zune, which is quite a different area.

brabebhin commented 1 year ago

They've been using the zune name internally for the xbox music app as well. I don't think it really means anything.

softworkz commented 1 year ago

They've been using the zune name internally for the xbox music app as well.

Yes, because it was all originally developed for their Zune music service (another failure btw.)

lukasf commented 1 year ago

I was pretty sure they use winui, because I noticed it shows some bugs I know from winui controls ^^

brabebhin commented 11 months ago

This seems to haven gotten worse lol.

softworkz commented 11 months ago

Why?

brabebhin commented 11 months ago

For me, not even messing with the accessibility settings will show the subs.

ffmpeginteropx / FFmpegInteropX

Subtitle Configuration: FontFamily (UWP) #377