Open softworkz opened 1 year ago
Internally we use TimedMetadataTrack and its related APIs to handle subtitles. There are multiple reasons we do so, you can dig up the PR for subs support from 5 years ago.
This means that winRT has some degree of authority over how subtitles are presented. I alway wondered how this font thing would work taking windows settings into account. It may look like your windows settings are overwriting the custom settings.
I remember this used to work at some point, I will look into it again.
you can dig up the PR for subs support from 5 years ago.
I remember this used to work at some point,
I had already looked at the code, mainly to find out how you are setting custom fonts (which are not installed on the system), and yes, that made me think the same: It must have worked at some time.
To clarify: I don't mean to say that it's a problem of FFmpegInteropX. I get the same results when adding the subtitle track via Windows.Media methods directly.
So, what I'm rather wondering: is it only me where it's not working (maybe due to some system configuration or certain UWP project/appx settings) - or is it a general problem (maybe with recent SDKs)?
I haven't seen major differences in the way APIs behave between SDK versions. I think this is more of a Windows problem. It may also be that the font family property is meant for the use case in which the application renders the subtitles, not when it's done thought the MPE. It was working at some point as in actually a bug in Windows that got fixed.
I would love to have the subtitles rendered fully within ffmpeginteropx, but we are missing libass build for that. We could get rid completely of MPE after that by rendering things in frame server mode, which provides more flexibility and performance.
I think this is more of a Windows problem.
Yup, very well possible.
It may also be that the font family property is meant for the use case in which the application renders the subtitles, not when it's done thought the MPE.
But all other text style options are working properly..
Can you confirm that it's the same on your side, i.e. that you can set the font to Bold but it ignores the FontFamily setting?
I would love to have the subtitles rendered fully within ffmpeginteropx, but we are missing libass build for that. We could get rid completely of MPE after that by rendering things in frame server mode, which provides more flexibility and performance.
How exactly would you want to do that? For proper libass rendering, you need to generate overlay frames at the same fps (or at least half) and I don't think this would work using ImageCue items. Blending the frames together is a too expensive operation I suppose, so I would see only the following two ways:
One of the kinks of this project is that we don't really have it all figured out. Until we link libass and see what it does, we don't really know how it will go.
I know what libass does. But I don't know how you are supplying video frames to MediaPlayer :-)
There's multiple ways, depends on the video decoder mode.
We have directx based decoding, and we can acces the texture containing video data.
There's the pass through, which is basically the system doing the decoding - total black box.
Software decoding inside ffmpeg is pretty self explanatory.
We could render subtitles to a directx image and expose the subtitle streams to MediaPlaybackItem the same way we do for audio and video, and we would feed back raw images.
Or we could overlay the subtitles using a ffmpeg filter. Multiple ways really, but until we have libass linked in and get a gist of it, it's pretty difficult to say which way to go.
This will also entitle a complete rewrite of the subtitles engine, which may or may not be productive in the long run.
My subtitles PR for ffmpeg (https://github.com/ffstaging/FFmpeg/pull/18) includes a new text2graphicsub filter which outputs the rendered subtitles as video frames. We use it for subtitle-burn-in at the server side.
You can use the output with the ffmpeg overlay
filter to do software overlay, or you can also use the hwupload
filter and do hardware overlay. But there's nothing like an "overlay_d3d" filter, only those specific to a certain hw context, like overlay_cuda
and overlay_qsv
with which it is working fine, but it requires specific handling and filter setup depending on the hw context.
What I meant above (re: two d3d surfaces) would be avoiding any burn-in processing and have the overlay done "on-screen" by showing the two surfaces, with the subtitles-layer on top.
The advantage of d3d is that it fully abstracts hardware. We prefer that over closed techs like cuda.
The elephant in the room is the pass through decoder, which is still needed for xbox and doesn't allow any kind of frame manipulation.
The advantage of d3d is that it fully abstracts hardware. We prefer that over closed techs like cuda.
Both are closed (in the sense of not being open-source), but CUDA and Intel frameworks are cross-platform, Intel partially open-source. But the primary point is that ffmpeg doesn't provide D3D11 filters but there are many hw filters for Nvidia and Intel, which makes implementing something in that area pretty convenient.
But a pure D3D based implementation is still a serious option to consider of course.
The elephant in the room is the pass through decoder, which is still needed for xbox and doesn't allow any kind of frame manipulation.
So it's not possible to work with D3D surfaces on xbox?
It still does hw accelerated playback, though, no?
Writing code that supports only a subset of available hardware, no matter how big that is, is not convenient. This is why we are avoiding CUDA and Intel frameworks.
I don't know how xbox does it, never debugged it, some people reported that only pass through works on xbox with acceptable performance. I suppose there's some differences in how the d3d pointer that we use for directx decoding is handled by mf on xbox.
If we'd include libass, my plan would be to output them just as regular image subtitles. I don't see a reason why this should not work. Libass renders text subs to bitmaps, we turn them into SoftwareBitmaps and add them to the ImageCues. MediaPlayer renders them on top of the video image using HW acceleration.
Using burn in subtitles has many disatvantages. It is limited to the video's resolution, so especially if you have a lower res video, burn in subs look blurry. It is also slow if you do it in software. Since ffmpeg filters do not support dx11, using its filters would automatically mean doing it slow in CPU memory. Plus, when using GPU video decoding, it would mean copying decoded video frames from GPU->CPU copy, then doing sub rendering in SW, then copy back from CPU to GPU for rendering. That's a heavy performance penalty.
If we'd include libass, my plan would be to output them just as regular image subtitles. I don't see a reason why this should not work
Proper ass rendering means to provide images at framerate similar to the video (or half the fps at minimum) and the ImageCue feature is not made for that,
ASS subtitle animations are a crucial feature, especially in the area of Anime.
Here are some examples in case you're not familiar:
Using burn in subtitles has many disatvantages. It is limited to the video's resolution, so especially if you have a lower res video, burn in subs look blurry.
Correct - it's always the last resort option.
It is also slow if you do it in software. Since ffmpeg filters do not support dx11, using its filters would automatically mean doing it slow in CPU memory. Plus, when using GPU video decoding, it would mean copying decoded video frames from GPU->CPU copy, then doing sub rendering in SW, then copy back from CPU to GPU for rendering. That's a heavy performance penalty.
Yup, that's why I've created the textszub2video filter for uploading subtitle frames into a hw context, so the overlay can be done in hw.
Since ffmpeg filters do not support dx11
The QSV filters do - but only with Intel hw...
Then there's OpenCL and Vulkan, for both exist filters in ffmpeg. You can derive an OpenCL hw context from qsv and cuda contexts. It doesn't help much in case of Nvidia though, because you still can't get D3D surfaces. AMD is a bit late to the game, but I'm currently in contact with them as they are about to introduce an AMF hw context to ffmpeg including a set of filters. On Windows, their backend API is D3D, so for AMD it will probably work as well.
We can't really ignore AMD. As long as they provide iGPU, their hardware is to be supported.
Let's not forget that this lib is a playback lib, and we are rendering to a MediaPlayerElement which uses D3D11 tech. I don't think that any proprietary cuda or whatever could help us here (even if we'd ignore the AMD question). We need the images in D3D11.
It could be that ImageCue is not fast enough for animations. Frame rate does not really have to be in sync with video, if you do not burn in. But frame times need to be more or less stable and frame rate not too low. I'd think it's worth a try. But there does not seem to be any progess on the meson build PR at libass, unfortunately. So it is difficult to integrate it into our build system.
If ImageCues really do not work, then things would become really messy. Either we'd have burn in, with all its downsides. Or we'd have to come up with a completely custom subtitle renderer, fully synced up with the actual video. I don't like that idea. Also because it's not UI agnostic, so a UWP renderer would be needed, as well as a WinUI renderer. Well there would even be a third option: If your filter can render ass subs into a video stream, then we could expose that stream as a separate MediaStreamSource. The subtitle could be rendered to a MediaPlayerElement right above the "normal" MediaPlayerElement, synced up using MediaTimelineController. But is's also quite a complicated hack. This all sounds pretty bad to me.
Technically we could create our own MPE with frame server mode, which would be reasonable reusable between uwp and winUI, just different namespace imports. But we would still relay on directx. I don't think cuda/whatever would be different than software decoder on how they would fit in the decoder pipeline, they all eventually end up in a directx surface. I don't think ignoring directx on windows is a good idea. I can see why intel, nvidia and amd (i wonder when qualcomm will too) would try to push their own tech stacks, but i don't think we have the man power to deal with that arms race. Directx makes thinga simple, portable across hardware, performance is more than enough.
We need the images in D3D11. Let's not forget that this lib is a playback lib
Correct. Only with QSV it would work. But as I have spent significant time on this subject over the past years (we have probably the best-in-class automatic generation of ffmpeg video processing pipelines for realtime transcoding), I would agree that this is out-of-scope for this library as it really takes a lot of time for getting this all working, even though subtitles are just one part of the story.
It could be that ImageCue is not fast enough for animations. Frame rate does not really have to be in sync with video,
It's not that much about being fast and being in sync - it's about being contiguous, i.e. show one subtitle image after the other, but without gaps (=> flickering) and without overlap (=> tearing).
So it is difficult to integrate it into our build system
Why that? I have it in a Visual Studio solution as a VC project and for production, we're building with gcc on MSYS2..
If ImageCues really do not work, then things would become really messy. Either we'd have burn in, with all its downsides. Or we'd have to come up with a completely custom subtitle renderer, fully synced up with the actual video. I don't like that idea. Also because it's not UI agnostic, so a UWP renderer would be needed, as well as a WinUI renderer.
I wouldn't like that either. The only two good things I can note are:
The subtitle could be rendered to a MediaPlayerElement right above the "normal" MediaPlayerElement, synced up using MediaTimelineController.
Do those elements even support transparency?
I don't think cuda/whatever would be different than software decoder on how they would fit in the decoder pipeline,
It would be different because the nvidia decoders can output frames directly into a CUDA context.
I don't think ignoring directx on windows is a good idea. I can see why intel, nvidia and amd (i wonder when qualcomm will too) would try to push their own tech stacks,
I think a main reason is platform independence. All of them are supporting interoperability with OpenCL and Vulkan, so it's not really about fencing strategies (well - not in this case at least). And I think Intel did it best, because their QSV stack is platform independent, but built on D3D for Windows and VAAPI for Linux. AMD are about to follow a similar path.
So, it's all not that bad from the side of the industry, but all this doesn't help us for that specific problem. I rather see the shortcoming here at the side of MS, because they created all the decoder APIs for D3D11 (and DXVA2 previously), but they didn't do much in the area of hardware image processing. (some APIs were added for D3D11, but ffmpeg doesn't support them).
This library is windows only and will always be, so portability in this case is all about hardware support. I've dealt enough with hardware specific GPU APIs to know i don't want to deal with it again soon.
If an API can provide directx levels of hardware portability and performance with better effect support then that would be worth pursuing. As a replacement for d3d11.
Yes, we're bound to D3D11 here, that's probably out of question.
I have some more leads that might be helpful (or not at all):
Just for the record, this is solely a philosophical conversation.
MF can handle the ass just fine. MPE and MediaPlayer are based on MF interfaces that are easily accessible to everyone in c++. The trouble here is that using those interfaces directly is complicated, it involves essentially rewriting MPE, and it does not guarantee it will actually fix the problem of subtitle fonts. I had a look at this some time ago. It is theoretically doable. It would be great to have our own MPE. But it's a lot of work. Possibly for nothing.
Theoretically we can implement subtitle rendering using win2D. At least for the non-animation stuff this should be pretty easy. I actually tried that at some point and had sub/srt going quite well.
Can you confirm that it's the same on your side, i.e. that you can set the font to Bold but it ignores the FontFamily setting?
This is indeed the case. Sorry for the late response. I think accessibility settings override this: font is something that you can explicitly set in windows settings, Bold is not.
I think accessibility settings override this
Yup, I've found the same. On WinUI it's even worse: The background boxes are shown but the text is completely invisible. You can only make it visible by applying a text effect in Windows settings. What a mess..
On the other hand it means they are doing things with subtitles, and might also fix the frame server mode bug.
On the other hand it means they are doing things with subtitles,
Doing? Maybe...but for the worse, not for the better. On UWP they are visible at least... š
I'm sure there's some mechanism to deactivate the influence of accessibility settings on this. But it might once again take hours or days to find out.
I don't believe it is worth looking into it. This is clearly a bug. And one that occurred recently. It was working fine a few weeks ago.
The accessibility settings shouldn't produce invisible subtitles lol. That's kind of the point of accessibility.
I'm sure there's some mechanism to deactivate the influence of accessibility settings on this. But it might once again take hours or days to find out.
You would probably have to dig into the media foundation interfaces that these classes wrap. I had them mapped at some point. They are almost a 1 for 1 map from media foundation to the Windows.Media namespace.
The accessibility settings shouldn't produce invisible subtitles lol. That's kind of the point of accessibility.
Yes, I'm sure they will fix it, but what I want is that the accessibility setting don't have any influence on our app, because it has its own subtitle settings configuration and it's hard to sell that some parts need to be configured here and other parts there.
I think that it is supposed to work with presentation kind "ApplicationPresented" and not "PlatformPresented". But you have to render the subtitles yourself. Not sure how. I don't think there's anything built in.
I think that it is supposed to work with presentation kind "ApplicationPresented" and not "PlatformPresented"
But wait a second - if that were true, what would be the purpose of all the style settings (TimedTextStyle iirc)? ...which are actually working - partially at least.
You would probably have to dig into the media foundation interfaces that these classes wrap. I had them mapped at some point. They are almost a 1 for 1 map from media foundation to the Windows.Media namespace.
Normally, these things work in the way that the control panel stores the settings in the Registry and then a WM_SETTINGSCHANGE window message is broadcast to the top-level window of all processes (which have a message pump).
So I'd use procmon to find the registry change. This should give a few keywords to search for on GitHub or elsewhere for documentation or reported issues. If that doesn't help, the registry values could maybe removed or changed while the app is running and restored on close.
To find out whether it can be disabled at the Window.Media side, I'd do a text search to find the responsible dll, disassemble it and try to find adjacent settings keys which sound like they could disable it or look for conditions under which those values from control panel get applied (or rather not applied. When it's a small dll, then it can even be fun, but the larger the more unpleasant it gets, so it's always the last option... ;-)
So I'd use procmon to find the registry change.
Better idea: Since you can save your settings under a name, you can just enter an unusual string and search for it in the registry.
That went quickly: Computer\HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\ClosedCaptioning
Here's a non-programmatic solution. When you merge this REG file, it will add a captioning theme with empty configuration, named "Controlled by Application" and sets it as active.
After that, all subtitle styles are controlled by the application again, including font family.
To find out whether it can be disabled at the Window.Media side, I'd do a text search to find the responsible dll, disassemble it and try to find adjacent settings keys which sound like they could disable it or look for conditions under which those values from control panel get applied (or rather not applied. When it's a small dll, then it can even be fun, but the larger the more unpleasant it gets, so it's always the last option... ;-)
Actually I think it is at MPE level. Because otherwise you would get the same result in both UWP and winUI, since MediaPlayer in Windows.Media is the same for both.
Actually I think it is at MPE level.
Yes, I'm sure you're right about that due to some things I read. I haven't come to dive into it yet.
It's a much bigger mess than expected. A really big one.. See my report here: https://github.com/microsoft/microsoft-ui-xaml/issues/9126
I found a workaround, but it works with WinUI3 only, not with UWP.
(this is not the workaround that works on WinUI3)
I'd do a text search to find the responsible dll, disassemble it and try to find adjacent settings keys which sound like they could disable it or look for conditions under which those values from control panel get applied (or rather not applied.
The strategy turned out perfectly (almost). I was able to find an internal override mechanism using two additional registry keys: GlobalOverride and RedirectionKey.
Unfortunately it doesn't hep, because it's for globally overriding the settings system-wide.
I wonder how the developers of Media Player didn't notice this. Are they not using MediaPlayerElement? That would mean there's another way to render subs.
That's a good question, probably they did notice, but in a big company it's often difficult to voice up. It can be sufficient when there's a single person being ignorant to the issue to silence such concerns.
The previous Windows Media Player was a pure Win32 application using DirectShow for media handling. It had its own custom UI framework (remember the skinning feature...). I don't know what the new Media Player is using, but I doubt that it would be using WinUI in any way. As team @MS, you want to minimize dependencies on any other team to avoid getting blocked or limited (or broken) by them. So I suspect they are working at the lowest possible level, probably MediaFoundation without even using MediaPlayer from Windows.Media.dll.
So I suspect they are working at the lowest possible level, probably MediaFoundation without even using MediaPlayer from Windows.Media.dll.
Possibly. IIRC they released the Media Player app before the MediaPlayerElement was available in winUI.
Anyways, playing around I noticed that the method RenderSubtitlesToSurface in frame server mode no longer throws an exception if the subtitle presentation mode is set to be ApplicationPresented. Now the problem is, the SubtitleFrameChanged event never triggers, despite having an active subtitle lol, so no subtitles ever get drawn. I guess this is progress.
If I can figure this out, we should technically have an alternative way of rendering subtitles without ever touching MPE. Assuming the RenderSubtitlesToSurface method also isn't bugged.
What the actual lolz, if I disable frame server mode I get the SubtitleFrameChanged in MediaPlayer. What is going on here >.<
So I suspect they are working at the lowest possible level, probably MediaFoundation without even using MediaPlayer from Windows.Media.dll.
Possibly. IIRC they released the Media Player app before the MediaPlayerElement was available in winUI.
Oh well - my suspicion was wrong (I think for the first time since writing in this repo š): It actually does use WinUI2 and the Windows.Media features.
But I also figured out the reason why:
It's not made by the core media team. It originates from Zune, which is quite a different area.
They've been using the zune name internally for the xbox music app as well. I don't think it really means anything.
They've been using the zune name internally for the xbox music app as well.
Yes, because it was all originally developed for their Zune music service (another failure btw.)
I was pretty sure they use winui, because I noticed it shows some bugs I know from winui controls ^^
This seems to haven gotten worse lol.
Why?
For me, not even messing with the accessibility settings will show the subs.
Is this working on your side?
Initially, I wanted to get it working with a custom font supplied as appx resource, but then I realized, that it's not even working when setting a system-installed font:
At some point I got really desperate and tried this:
But no chance to change the font. Everything else is working: Font size, font bold, foreground color, etc.
Is it working on your side?