Bitmap Subtitles with WinUI3

softworkz commented 10 months ago

Just wanted to check back: Has this ever worked for any of you so far?

brabebhin commented 10 months ago

I must admit i haven't actually tried this in winui3, but this worked in uwp (at some point). I will give it a try again tonight.

softworkz commented 10 months ago

It's working fine in UWP!

brabebhin commented 10 months ago

Ah I see where this is going lol.

softworkz commented 10 months ago

It's not crucial right now as we're using MPV as primary player for the WinUI app, but @lukasf had introduced the WindowId parameter to the CreateFromUriAsync() method, so I was wondering whether he ever got it working.

lukasf commented 10 months ago

Nope. It is one of the things that still does not work on WinUI. It does not have anything to do with the WindowId. ImageCues just don't render at all.

brabebhin commented 10 months ago

Yep, it is not working.

softworkz commented 10 months ago

Nope. It is one of the things that still does not work on WinUI. It does not have anything to do with the WindowId. ImageCues just don't render at all.

Alright. Thanks for clarifying.

brabebhin commented 8 months ago

So I think I figured out how to use RenderSubtitlesToSurface, you need to use a CanvasRenderTarget (off screen rendering) with win2D.

This engine seems to render subtitles in a completely different manner compared to the XAML composition based subtitles.

softworkz commented 8 months ago

https://learn.microsoft.com/en-us/windows/uwp/audio-video-camera/play-audio-and-video-with-mediaplayer#use-mediaplayer-in-frame-server-mode

Doesn't this code work...?

brabebhin commented 8 months ago

No, not quite. I mean it works for rendering video, but not subtitles. Clearly whoever wrote that doc didn't bother checking. The example would keep throwing InvalidArgumentExceptions. However, I thought that maybe whoever implemented RenderSubtitlesToSurface wasn't completely outright shipping a useless API in the public winRT interface, so I tried several things.

The current state of my WIP implementation is here

https://github.com/brabebhin/MayazucMediaPlayer/blob/main/source/MayazucNativeFramework/FrameServerRenderer.cpp

softworkz commented 8 months ago

No, not quite. I mean it works for rendering video, but not subtitles. Clearly whoever wrote that doc didn't bother checking. The example would keep throwing InvalidArgumentExceptions.

The code is taken from this sample: https://github.com/MicrosoftDocs/windows-dev-docs/blob/docs/uwp/audio-video-camera/code/MediaPlayer_RS1/cs/MainPage.xaml.cs

At least at some point it must have worked.

softworkz commented 8 months ago

Here's one more code sample I found: https://github.com/drasticactions/WinUIEx/commit/4ad8524a6cf4aee6abbc66df248f8aea4a4ca767

Interestingly there's also some mentioning of "crashes".

But I have a theory...

brabebhin commented 8 months ago

I doubt it. I've been trying that api ever since it came out, way before winui. This is the first time I managed to make it not crash.

I was probably the only crazy dude in the world trying it. It only became a focal point when winui3 shipped without MPE when others started look at it. I even submitted bug reports and crash dumps in the feedback hub, all to be ignored.

softworkz commented 8 months ago

Have you tried running the sample as is, i.e. 100% uinchanged?

brabebhin commented 8 months ago

I didn't run exactly that code. Never knew it existed. But the code I had was practically identical to that. The catch is that you can't render the subtitles to the software bitmap object they have there, only the canvas render target. My theory is that it is some limitation in XAML, and the MF only accepts an off screen buffer to render to. Probably because IMFMediaEngine interface of MF which is basically the native implementation of MediaPlayer has a swap chain mode, and in the swap chain you always render to the back buffer.

Or they could simply be rendering XAML composition elements (like they do with the ordinary subs) to a bitmap and for some reason, they can only do it with a back buffer.

But I have a theory...

My theory is that whoever created winRT back in the early 2010s has long leaved Microsoft or was fired in one of their yearly purges, and the subsequent teams don't have the capacity to understand the complexities of the system involved, and they get purged as well before they have the time to build up the knowledge, and the trend continues to this day. This explains why winUI, UWP, MAUI and even winRT itself has some unexplainable bugs that nobody is trying to fix - they simply don't know how, and don't have the possibility to learn how to fix them.

softworkz commented 8 months ago

I have left a comment over in the other repo, here's a copy:

Hi @dotMorten,

we at FFmpegInteropX are wondering about the intended use of the RenderSubtitlesToSurface() method as well, also having seen "crashes",

I haven't worked on it myself, just heard the reports saying it's crashing. Recently I re-read the docs and I had a certain suspicion. Yet it didn't match with the symptoms of "always crashing" and I abandoned the idea. Now I found this commit of yours which says:

this'll only render on one frame until next text. Crash if repeating without this flag getting set.

That's quite different from "always crashing" and actually aligns with what I had thought earlier. There is a pragraph in the docs which caught my attention:

For the overload with a target rectagle:

reference: https://learn.microsoft.com/en-us/uwp/api/windows.media.playback.mediaplayer.rendersubtitlestosurface?view=winrt-22621

=> So why should this method be "less efficient" than the other overload? This method renders to a constrained area only and the other one renders to the full-size frame. How can it be "less efficient"?

There are two more hints:

Full-Frame Overload has this text: if the method returns false, then no subtitles were rendered. In this case you may decide to hide the subtitle render surface in your UI.
Rectangle Overload also says: but it allows you to use the same surface for rendering video and subtitles rather than requiring a separate surface for subtitles.

Rendering onto the same surface as the video frames comes with the specific implication that you need to re-draw the subtitle again and again for every single video frame that is shown, because each video frame overwrites everything from before and so you need to redraw the subtitle text for each frame. To make this more economic (just think of 4k video at 60fps), you can/must constrain the rendering to a certain area (no idea how to determine it actually). And - if I'm right - that would be the explanation why they say that it's less efficient: even though the area is constrained, you need to do it for every video frame.

I suspect that the other overload is meant to be used in a very different way: you create an additional (transparent) surface on top of the video frame and at the same size. When there's a subtitle graphic to display, you call RenderSubtitlesToSurface() to paint it onto your subtitle surface. As this is a separate surface, it doesn't get overwritten by the video frames.

=> In turn, you need to render it just once. And that would be the explanation why you were seeing subsequent calls failing

Conclusion would be: thiis method is not meant to be called "per-frame" but only once on each change and meant to be painted on a separate surface, while the other overload is meant to be painted on each video frame but constrained to a certain region.

brabebhin commented 8 months ago

I agree that the best way is to have 2 bitmaps, one for video and one for the subs. However, I don't think that has anything to do with the crash I was experiencing, and probably what the winUIEx devs were having was a completely different problem - they are already doing the off screen rendering with the back buffer of the swap chain.

However, in my case, I don't think there's going to be a problem to separate the implementation into 2 bitmaps. But at this point I am just overly happy I finally figured out how to do it.

softworkz commented 8 months ago

I don't think that has anything to do with the crash

what the winUIEx devs were having was a completely different problem

I am just overly happy I finally figured out how to do it

That's why I posted over there: I knew you wouldnt be much interested in testing my theory... 😆

brabebhin commented 8 months ago

There are some interesting points in your theory.

I already tried rendering to the same surface earlier today, without the overload that takes a rect region to render to, on every video frame. If the surface is a software bitmap, it crashes, if the surface is a canvasrendertarget, it works fine. More so, it will render the subtitles exactly as you would expect them to be render (bottom centered, with proper styles and such), and the area where there's no subs shown can be transparent, thus obtaining the effect you are aiming for. You just need to correctly order the operations before issuing the GPU command: first the video frame, then the subtitles. Note that DirectX rendering operations are not issued to the GPU until a Flush or Close command is called onto the CommandList (in our case, the CommandList is a drawing session). So you can accumulate lots of render commands in a sequence before anything is actually rendered.
The current implementation sort of uses 2 surfaces, one for video and one for subtitles, and then merges both in the final output to be displayed, which is one image. Both surfaces have the exact same size, the size of the final image that is to be displayed, and that comes from XAML.
I suspect the overload which takes a rect to limit the region to render into has some use cases meant for specific cases, such as custom padding to the subs area (I know you were interested in this), shifting the subs up when the transport controls show up, for games in which the subs should always be rendered in a specific area of the viewport, etc.
The only reason to use 2 different output XAML Images is so that you don't need to render subs on every video frame, and you can technically multithread them as well. So extra performance.
The documentation is completely bogus: You can use either overload with any valid directx surface, no matter if the surface was rendered to before or not. It would have been much more useful to explain the requirements instilled in the directx surface, rather than bogging us down with all sorts of plainly visible optimizations (like hiding the subtitle surface when there's no sub to draw). It is really debatable if this is the right way to do it, or if we should render a transparent surface using directx anyways. It depends on the XAML performance, I heard bindings to dependency properties are quite slow.
The so called bug is most likely not a bug. The MediaPlayer implementation likely has some very strict requirements on how the DirectX surface should be configured. Those can be retrieved from the desc property of the DXGI surface that's wrapped behind the CanvasRenderTarget. The problem is nobody bothered documenting those requirements, and the sample code is wrong. This comes back to my theory: whoever did this are long gone, and none of the newbiews can figure out what they did.

brabebhin commented 8 months ago

If you can share one of those files with bitmap subtitles I can gladly test to see if they work (the external disk that was storing mine died a few days ago)

softworkz commented 8 months ago

I already tried rendering to the same surface earlier today, without the overload that takes a rect region to render to, on every video frame. If the surface is a software bitmap, it crashes,

Does it "crash" always or does it succeed once each time when there's a subtitle change?

The current implementation sort of uses 2 surfaces, one for video and one for subtitles, and then merges both in the final output to be displayed,

That's the worst possible way actually...

I mean: none of these procedures is suitable for production use anyway. There's just a single right way for this:

Compressed video goes into GPU/GpuMem
GPU decodes video frames into surfaces (in GpuMem)
The app only tells the GPU at which point in time it should present each surface
shown frames go back into the pool for re-use
no video frame ever leaves GpuMemory

rather than bogging us down with all sorts of plainly visible optimizations (like hiding the subtitle surface when there's no sub to draw). It is really debatable if this is the right way to do it, or if we should render a transparent surface using directx anyways.

The latter is exactly what they are suggesting... Having a transparent layer in the scene still requires calculation which can be saved by hiding the layer, even though it's transparent anyway.

softworkz commented 8 months ago

If you can share one of those files with bitmap subtitles I can gladly test to see if they work (the external disk that was storing mine died a few days ago)

Check out these:

Let me know when you need something different. Thanks

brabebhin commented 8 months ago

Thanks for the files. Here we go

It's funny, cause the "Media Player" app of windows 11 doesn't even detect the subtitle. But clearly winRT and MF posses the capability to deal with the subtitle.

Am i going insane here?

I still can't believe this works, I expect to be some gotcha that i'm missing.

That's the worst possible way actually...

I mean: none of these procedures is suitable for production use anyway. There's just a single right way for this:
Compressed video goes into GPU/GpuMem
GPU decodes video frames into surfaces (in GpuMem)
The app only tells the GPU at which point in time it should present each surface
shown frames go back into the pool for re-use
no video frame ever leaves GpuMemory

I agree it is not production ready, however, no video frame ever leaves GPU memory. By the time the video frame server's VideoFrame available event triggers, the video has already been decoded and it exists in GPU memory (be it dedicated or integrated). This intercepts right before being presented to the user. At this point, the only way to even copy the data in CPU memory I believe is to cause a pipeline stall, which will instantly kill performance and drop it to like 1 FPS or something.

Does it "crash" always or does it succeed once each time when there's a subtitle change?

I could never, ever get a subtitle to render to a SoftwareBitmap.

softworkz commented 8 months ago

which will instantly kill performance and drop it to like 1 FPS or something.

No, sw is not that bad. Remember, FFmpegInteropX has a setting "FFmpegSoftwareDecoder". It's primarily taking resources and draining batteries empty on laptops.

I agree it is not production ready, however, no video frame ever leaves GPU memory. By the time the video frame server's VideoFrame available event triggers, the video has already been decoded. This intercepts right before being presented to the user.

You are calling "RenderVideoToSurface", "RenderSubtitleToSurface" then your are "merging" the surfaces together beforte presenting. Even when it's in GPU memory, that's still a lot of operations, each time, something gets copied.

Is the Win2dCanvas doing everything and always in GPU memory?

softworkz commented 8 months ago

It's funny, cause the "Media Player" app of windows 11 doesn't even detect the subtitle. But clearly winRT and MF posses the capability to deal with the subtitle.

Am i going insane here?

I still can't believe this works, I expect to be some gotcha that i'm missing.

The subs are looking fine!

Can you check whether the outline issue exists when doing this way?

brabebhin commented 8 months ago

The performance problem of the pipeline stall lies in the CPU - GPU synchronison at driver level, not on the performance of either one of the components. It is easy to transfer data from cpu to gpu, which is what the software decoder does. The other way around it is much, much harder. There's also hardware constraints here in the architecture of the pci-e lanes as well, the topic is pretty complex.

The GPU is designed to handle multiple drawing commands and copy data inside its GDDRAM and it is very good at doing it. I don't believe the current implementation has major performance issues, but i will implement the separated drawing of subs in a different surfaces anyways. There are multiple reasons to do the separation, not just performance gain. There's also the question of HDR, which i haven't tested at all.

Glad to know the subs look good. I will check the outline as well, once I emerge from the nightly hibernation cycle.

Yes, the win2d cavnas does all data manipulation in gpu memory. You may be able to initialize it with data from the system memory but once that's done, all is done in GPU.

dotMorten commented 8 months ago

Following up on your comment on WinUIEx's subtitle rendering: I honestly don't remember the details - I've since deleted the mediaplayer implementation since WinUI now has it built-in, so this was no longer needed. I looked at the WinUI3's source code and I don't see any use of the subtitle rendering stuff - the closest I got was the timed text stuff which might be worth digging into: https://github.com/microsoft/microsoft-ui-xaml/blob/winui3/release/1.5-stable/dxaml/xcp/dxaml/lib/TimedTextSource.cpp

It does seem like they just use XAML to render the subtitles judging from the comments in this issue: https://github.com/microsoft/microsoft-ui-xaml/issues/7981 and that rendering them directly into the surface doesn't work: https://github.com/microsoft/microsoft-ui-xaml/issues/6610

softworkz commented 8 months ago

@dotMorten - Thanks for getting back!

I looked at the WinUI3's source code and I don't see any use of the subtitle rendering stuff

It does seem like they just use XAML to render the subtitles

Correct. That's done in CueStyler.cpp (and the other) for which I made 2 PRs and there's also a pending issue. of mine regarding this. That's why we're evaluating the RenderSubtitles() method, which uses an entirely different implementation (from the Windows.Media area presumably).

I honestly don't remember the details

I did a GitHub search for this API, and the only real results were the old MS samples and your repo (and this one here), so we're a very small circle of people having dealt with it, apparently 😉

brabebhin commented 8 months ago

@softworkz I am once again going to need some test files for the outline ^^

brabebhin commented 8 months ago

Scripting seems to partially work for SSA/ASS

but outlines and colors are overridden by Windows settings

This is the expected result

The subs also seem to overlay on top of each other when there's multiple active at the same time. This is worth investigating if it is us using the incorrect region when adding external subs or the renderer is bugged.

I guess my next mission would be to implement my own rendering with win2D, in order to get rid of the windows settings nonsense.

brabebhin commented 8 months ago

I also tried your reg hax and it doesn't fix the issue. It would appear this implementation does not support outlines at all, or your reg hax isn't applicable to this.

This continues the trend of half measures we've been observing over the year when it comes to winRT media objects.

So basically:

UWP subtitle implementation supports everything?
winUI doesn't support image subs and poorly supports ASS/SSA
Windows.Media implementation supports everything except outlines?

It's time we take matters into our own hands lol. Although if we integrate libass and reduce everything to image subs, this would fix all out problems.

brabebhin commented 8 months ago

Interestingly, not even DirectX seems to have support for text outlines. They all use the multi text trick that winUI 3 uses.

softworkz commented 8 months ago

I also tried your reg hax and it doesn't fix the issue. It would appear this implementation does not support outlines at all, or your reg hax isn't applicable to this.

Right. Outlines aren't supported everywhere for some reason.

They are missing here: https://learn.microsoft.com/en-us/uwp/api/windows.media.closedcaptioning.closedcaptionedgeeffect?view=winrt-22621

and here:

But they are present here: https://learn.microsoft.com/en-us/uwp/api/windows.media.core.timedtextstyle.outlinethickness?view=winrt-22621

softworkz commented 8 months ago

Interestingly, not even DirectX seems to have support for text outlines. They all use the multi text trick that winUI 3 uses.

None of your pictures is showing outlines, did you get it working?

brabebhin commented 8 months ago

No, I was working on win2D rendering and was looking at the correspondence between TimedTextCue properties and win2D text rendering classes...

...and couldn't find outlines.

So then I googled in plain directx and found the 4 times rendering trick on an official MS page, so I guess that's that. Since XAML is directx based, they also don't support outlines xD

softworkz commented 8 months ago

So basically:

UWP subtitle implementation supports everything?

winUI doesn't support image subs and poorly supports ASS/SSA

No, UWP and WinUI are using the same implementation.

There's no support for ASS/SSA anywhere. This is just added/implemented by us.

Windows.Media implementation supports everything except outlines?

I'm not sure about outlines, but the effects (Raised, Depressed, DropShadow) must have existed long before. For two reasons:

Very early code in one of our clients has exactly those options for effects selection
The WinUI2/3 code has comments about "simulating" these effects via XAML. If they would have been just introduced, they wouldn't have to simulate something

softworkz commented 8 months ago

So then I googled in plain directx and found the 4 times rendering trick on an official MS page, so I guess that's that. Since XAML is directx based, they also don't support outlines xD

There's no close relation between the two. Drawing outlines requires a certain API (I'm not sure which it was, maybe "DirectWrite"), same like what AegiSub and ASS is using on Windows, and this API isn't available to XAML. That's why they used the 4-trick. But the API should be available to a DirectX based implementation...

brabebhin commented 8 months ago

I might be mistaken, but I think the directx way to do it is to rasterize the text and then apply a pixel shader to draw the outline. As to why this isn't exposed to XAML, I wouldn't know at this point. Probably something to do with their protected rendering process that renders XAML in a sandbox.

I'd assume most existing applications use GDI to write text, which is more mature than the directx approaches. This should be available to winUI 3 through, so that's a good hint you gave me. I expect this is how libass and others are doing it. Do tell if I'm wrong.

Nobody tried this with UWP simply cause GDI is not allowed on UWP.

softworkz commented 8 months ago

The problem is that only one of the text rendering APIs (DirectWrite iirc, dedinitely not GDI) allows you to access the outlines as vector shapes. Without having the vector shapes, it is impossible to draw proper outlines (not even with pixel shaders).

brabebhin commented 8 months ago

Well, writing our own implementation is probably not worth the time. I'd assume libass does it somehow, so I guess when we link this in, we will be able to render everything as bitmap subs as @lukasf originally envisioned.

Bitmap subs seem to be the only way left to bypass the windows settings when it comes to styling.

brabebhin commented 8 months ago

I did start working on a win2D based subtitle renderer. The bitmap subs are basically working. However, I only implemented in C#, to speed things up. Unfortunately, this causes performance problem a few minutes into playback.

The text based subs are somewhat more complicated and require more calculations. I guess they used a shortcut with StackPanels and XAML composition. Maybe the way forward is this, but I think the approach is inherently flawed and will cause performance issues due to GC objects in XAML tree, even if everything is written in C++

softworkz commented 8 months ago

I did start working on a win2D based subtitle renderer. The bitmap subs are basically working. However, I only implemented in C#, to speed things up. Unfortunately, this causes performance problem a few minutes into playback.

I think the starting point for trying something in this direction is to

not use FrameServer mode at all
try to position a transparent ( full-size) canvas/image on top of the MediaElement
paint subs onto the canvas (only on subtitle change)

Finding out whether this is working at all is crucial to determine what options exist at all. (it might not be possible to display elements on top of a swap chain panel, so when testing, make sure that the canvas covers the media element just partially for comparison)

brabebhin commented 8 months ago

The frame server is fine and at least in my case, it is here to stay - it is the only way to have both video effects and gapless video playback.

I'll probably need to translate this to C++ and see how it works out, if it is a problem in my implementation or it really is the GC that's doing things. The only problem is C++ debugger being pretty terrible.

The last time I did this, I had CanvasAnimatedControl, which works like a swap chain. Unfortunately, not available in winUI 3.

lukasf commented 8 months ago

Outline seems to be pretty easy to do in Win2d. And if Win2d can do it, then it should be possible as well using Direct2D APIs, because that's what Win2s is implemented with.

https://github.com/sonnemaf/Outline https://github.com/sonnemaf/Outline/blob/master/Outline/OutlinedTextBlock.cs

brabebhin commented 8 months ago

Nice find. Didn't think of using CanvasGeometry. Although I suspected something similar would be involved.

softworkz commented 8 months ago

Like I said: DirectWrite

https://github.com/microsoft/Win2D/blob/c3c65380095a330ce4e1157a8beeae1b9e7a41db/winrt/lib/geometry/CanvasGeometry.cpp#L1658-L1677

brabebhin commented 8 months ago

Great, this only makes things even more complicated xD

Cause you can technically apply the outline to only parts of the text line, so lines have to be potentially split into several geometries that need to be nicely stitched together...

Guess it will be done in 3 years.

softworkz commented 8 months ago

Cause you can technically apply the outline to only parts of the text line, so lines have to be potentially split into several geometries that need to be nicely stitched together...

What do you mean by that? Why stitch?

Do you mean because of TimedTextSubformat ?

Yes, every single letter can have a different style, but not just outlines, this also applies to color, font style/family/size etc.

Using DirectWrite directly might be a better choice. I would just look it up how AegiSub and libass are doing it. No need to re-do it from scratch, and in the end it still wouldn't look the same (possibly).

There are also other implications you need to be aware of. The recipe for outline rendering is something like this:

Outlines and text rendering need to be done completely separate from each other
bottommost are the outlines. They all need to be on the same layer They need to be drawn like this
- Strokewidth: douböle size of the configured stroke width (one half goes inside, hence doubling)
- Pen color: Outline color
- If the text itself has alpha, it should probably multiplied with the outline color's alpha value
- Pen shape: round/circle
- End lline caps: Round
- Miter: no limit, no bevel, etc.
On top of this, you need to apply the outline blur as configured
Then you need to combine the shapes of all text and combine it to a single region
The outline layer needs to be clipped by that region (= cut out)
- That's because the text can have an opacity < 1
Only then, you render the text with the respective fill colors

softworkz commented 8 months ago

You could simplify things a bit by constraining to the feature set of the ASS spec.

The relevant differences in this context: Outline color, Outline blur and outline width can only be set for whole text bocks, but not as inline styles.

brabebhin commented 8 months ago

Yeah it all sounds fun.

I will first deal with bitmap subs, those are the easiest.

ffmpeginteropx / FFmpegInteropX

Bitmap Subtitles with WinUI3 #399