Open softworkz opened 10 months ago
I must admit i haven't actually tried this in winui3, but this worked in uwp (at some point). I will give it a try again tonight.
It's working fine in UWP!
Ah I see where this is going lol.
It's not crucial right now as we're using MPV as primary player for the WinUI app, but @lukasf had introduced the WindowId parameter to the CreateFromUriAsync() method, so I was wondering whether he ever got it working.
Nope. It is one of the things that still does not work on WinUI. It does not have anything to do with the WindowId. ImageCues just don't render at all.
Yep, it is not working.
Nope. It is one of the things that still does not work on WinUI. It does not have anything to do with the WindowId. ImageCues just don't render at all.
Alright. Thanks for clarifying.
So I think I figured out how to use RenderSubtitlesToSurface, you need to use a CanvasRenderTarget (off screen rendering) with win2D.
This engine seems to render subtitles in a completely different manner compared to the XAML composition based subtitles.
No, not quite. I mean it works for rendering video, but not subtitles. Clearly whoever wrote that doc didn't bother checking. The example would keep throwing InvalidArgumentExceptions. However, I thought that maybe whoever implemented RenderSubtitlesToSurface wasn't completely outright shipping a useless API in the public winRT interface, so I tried several things.
The current state of my WIP implementation is here
No, not quite. I mean it works for rendering video, but not subtitles. Clearly whoever wrote that doc didn't bother checking. The example would keep throwing InvalidArgumentExceptions.
The code is taken from this sample: https://github.com/MicrosoftDocs/windows-dev-docs/blob/docs/uwp/audio-video-camera/code/MediaPlayer_RS1/cs/MainPage.xaml.cs
At least at some point it must have worked.
Here's one more code sample I found: https://github.com/drasticactions/WinUIEx/commit/4ad8524a6cf4aee6abbc66df248f8aea4a4ca767
Interestingly there's also some mentioning of "crashes".
But I have a theory...
I doubt it. I've been trying that api ever since it came out, way before winui. This is the first time I managed to make it not crash.
I was probably the only crazy dude in the world trying it. It only became a focal point when winui3 shipped without MPE when others started look at it. I even submitted bug reports and crash dumps in the feedback hub, all to be ignored.
Have you tried running the sample as is, i.e. 100% uinchanged?
I didn't run exactly that code. Never knew it existed. But the code I had was practically identical to that. The catch is that you can't render the subtitles to the software bitmap object they have there, only the canvas render target. My theory is that it is some limitation in XAML, and the MF only accepts an off screen buffer to render to. Probably because IMFMediaEngine interface of MF which is basically the native implementation of MediaPlayer has a swap chain mode, and in the swap chain you always render to the back buffer.
Or they could simply be rendering XAML composition elements (like they do with the ordinary subs) to a bitmap and for some reason, they can only do it with a back buffer.
But I have a theory...
My theory is that whoever created winRT back in the early 2010s has long leaved Microsoft or was fired in one of their yearly purges, and the subsequent teams don't have the capacity to understand the complexities of the system involved, and they get purged as well before they have the time to build up the knowledge, and the trend continues to this day. This explains why winUI, UWP, MAUI and even winRT itself has some unexplainable bugs that nobody is trying to fix - they simply don't know how, and don't have the possibility to learn how to fix them.
I have left a comment over in the other repo, here's a copy:
Hi @dotMorten,
we at FFmpegInteropX are wondering about the intended use of the RenderSubtitlesToSurface() method as well, also having seen "crashes",
I haven't worked on it myself, just heard the reports saying it's crashing. Recently I re-read the docs and I had a certain suspicion. Yet it didn't match with the symptoms of "always crashing" and I abandoned the idea. Now I found this commit of yours which says:
this'll only render on one frame until next text. Crash if repeating without this flag getting set.
That's quite different from "always crashing" and actually aligns with what I had thought earlier. There is a pragraph in the docs which caught my attention:
For the overload with a target rectagle:
=> So why should this method be "less efficient" than the other overload? This method renders to a constrained area only and the other one renders to the full-size frame. How can it be "less efficient"?
There are two more hints:
if the method returns false, then no subtitles were rendered. In this case you may decide to hide the subtitle render surface in your UI.
but it allows you to use the same surface for rendering video and subtitles rather than requiring a separate surface for subtitles.
Rendering onto the same surface as the video frames comes with the specific implication that you need to re-draw the subtitle again and again for every single video frame that is shown, because each video frame overwrites everything from before and so you need to redraw the subtitle text for each frame. To make this more economic (just think of 4k video at 60fps), you can/must constrain the rendering to a certain area (no idea how to determine it actually). And - if I'm right - that would be the explanation why they say that it's less efficient: even though the area is constrained, you need to do it for every video frame.
I suspect that the other overload is meant to be used in a very different way: you create an additional (transparent) surface on top of the video frame and at the same size. When there's a subtitle graphic to display, you call RenderSubtitlesToSurface()
to paint it onto your subtitle surface. As this is a separate surface, it doesn't get overwritten by the video frames.
=> In turn, you need to render it just once. And that would be the explanation why you were seeing subsequent calls failing
Conclusion would be: thiis method is not meant to be called "per-frame" but only once on each change and meant to be painted on a separate surface, while the other overload is meant to be painted on each video frame but constrained to a certain region.
I agree that the best way is to have 2 bitmaps, one for video and one for the subs. However, I don't think that has anything to do with the crash I was experiencing, and probably what the winUIEx devs were having was a completely different problem - they are already doing the off screen rendering with the back buffer of the swap chain.
However, in my case, I don't think there's going to be a problem to separate the implementation into 2 bitmaps. But at this point I am just overly happy I finally figured out how to do it.
I don't think that has anything to do with the crash
what the winUIEx devs were having was a completely different problem
I am just overly happy I finally figured out how to do it
That's why I posted over there: I knew you wouldnt be much interested in testing my theory... 😆
There are some interesting points in your theory.
If you can share one of those files with bitmap subtitles I can gladly test to see if they work (the external disk that was storing mine died a few days ago)
- I already tried rendering to the same surface earlier today, without the overload that takes a rect region to render to, on every video frame. If the surface is a software bitmap, it crashes,
Does it "crash" always or does it succeed once each time when there's a subtitle change?
The current implementation sort of uses 2 surfaces, one for video and one for subtitles, and then merges both in the final output to be displayed,
That's the worst possible way actually...
I mean: none of these procedures is suitable for production use anyway. There's just a single right way for this:
rather than bogging us down with all sorts of plainly visible optimizations (like hiding the subtitle surface when there's no sub to draw). It is really debatable if this is the right way to do it, or if we should render a transparent surface using directx anyways.
The latter is exactly what they are suggesting... Having a transparent layer in the scene still requires calculation which can be saved by hiding the layer, even though it's transparent anyway.
If you can share one of those files with bitmap subtitles I can gladly test to see if they work (the external disk that was storing mine died a few days ago)
Check out these:
Let me know when you need something different. Thanks
Thanks for the files. Here we go
It's funny, cause the "Media Player" app of windows 11 doesn't even detect the subtitle. But clearly winRT and MF posses the capability to deal with the subtitle.
Am i going insane here?
I still can't believe this works, I expect to be some gotcha that i'm missing.
That's the worst possible way actually...
I mean: none of these procedures is suitable for production use anyway. There's just a single right way for this:
Compressed video goes into GPU/GpuMem GPU decodes video frames into surfaces (in GpuMem) The app only tells the GPU at which point in time it should present each surface shown frames go back into the pool for re-use no video frame ever leaves GpuMemory
I agree it is not production ready, however, no video frame ever leaves GPU memory. By the time the video frame server's VideoFrame available event triggers, the video has already been decoded and it exists in GPU memory (be it dedicated or integrated). This intercepts right before being presented to the user. At this point, the only way to even copy the data in CPU memory I believe is to cause a pipeline stall, which will instantly kill performance and drop it to like 1 FPS or something.
Does it "crash" always or does it succeed once each time when there's a subtitle change?
I could never, ever get a subtitle to render to a SoftwareBitmap.
which will instantly kill performance and drop it to like 1 FPS or something.
No, sw is not that bad. Remember, FFmpegInteropX has a setting "FFmpegSoftwareDecoder". It's primarily taking resources and draining batteries empty on laptops.
I agree it is not production ready, however, no video frame ever leaves GPU memory. By the time the video frame server's VideoFrame available event triggers, the video has already been decoded. This intercepts right before being presented to the user.
You are calling "RenderVideoToSurface", "RenderSubtitleToSurface" then your are "merging" the surfaces together beforte presenting. Even when it's in GPU memory, that's still a lot of operations, each time, something gets copied.
Is the Win2dCanvas doing everything and always in GPU memory?
It's funny, cause the "Media Player" app of windows 11 doesn't even detect the subtitle. But clearly winRT and MF posses the capability to deal with the subtitle.
Am i going insane here?
I still can't believe this works, I expect to be some gotcha that i'm missing.
The subs are looking fine!
Can you check whether the outline issue exists when doing this way?
The performance problem of the pipeline stall lies in the CPU - GPU synchronison at driver level, not on the performance of either one of the components. It is easy to transfer data from cpu to gpu, which is what the software decoder does. The other way around it is much, much harder. There's also hardware constraints here in the architecture of the pci-e lanes as well, the topic is pretty complex.
The GPU is designed to handle multiple drawing commands and copy data inside its GDDRAM and it is very good at doing it. I don't believe the current implementation has major performance issues, but i will implement the separated drawing of subs in a different surfaces anyways. There are multiple reasons to do the separation, not just performance gain. There's also the question of HDR, which i haven't tested at all.
Glad to know the subs look good. I will check the outline as well, once I emerge from the nightly hibernation cycle.
Yes, the win2d cavnas does all data manipulation in gpu memory. You may be able to initialize it with data from the system memory but once that's done, all is done in GPU.
Following up on your comment on WinUIEx's subtitle rendering: I honestly don't remember the details - I've since deleted the mediaplayer implementation since WinUI now has it built-in, so this was no longer needed. I looked at the WinUI3's source code and I don't see any use of the subtitle rendering stuff - the closest I got was the timed text stuff which might be worth digging into: https://github.com/microsoft/microsoft-ui-xaml/blob/winui3/release/1.5-stable/dxaml/xcp/dxaml/lib/TimedTextSource.cpp
It does seem like they just use XAML to render the subtitles judging from the comments in this issue: https://github.com/microsoft/microsoft-ui-xaml/issues/7981 and that rendering them directly into the surface doesn't work: https://github.com/microsoft/microsoft-ui-xaml/issues/6610
@dotMorten - Thanks for getting back!
I looked at the WinUI3's source code and I don't see any use of the subtitle rendering stuff
It does seem like they just use XAML to render the subtitles
Correct. That's done in CueStyler.cpp (and the other) for which I made 2 PRs and there's also a pending issue. of mine regarding this. That's why we're evaluating the RenderSubtitles() method, which uses an entirely different implementation (from the Windows.Media area presumably).
I honestly don't remember the details
I did a GitHub search for this API, and the only real results were the old MS samples and your repo (and this one here), so we're a very small circle of people having dealt with it, apparently 😉
@softworkz I am once again going to need some test files for the outline ^^
Scripting seems to partially work for SSA/ASS
but outlines and colors are overridden by Windows settings
This is the expected result
The subs also seem to overlay on top of each other when there's multiple active at the same time. This is worth investigating if it is us using the incorrect region when adding external subs or the renderer is bugged.
I guess my next mission would be to implement my own rendering with win2D, in order to get rid of the windows settings nonsense.
I also tried your reg hax and it doesn't fix the issue. It would appear this implementation does not support outlines at all, or your reg hax isn't applicable to this.
This continues the trend of half measures we've been observing over the year when it comes to winRT media objects.
So basically:
It's time we take matters into our own hands lol. Although if we integrate libass and reduce everything to image subs, this would fix all out problems.
Interestingly, not even DirectX seems to have support for text outlines. They all use the multi text trick that winUI 3 uses.
I also tried your reg hax and it doesn't fix the issue. It would appear this implementation does not support outlines at all, or your reg hax isn't applicable to this.
Right. Outlines aren't supported everywhere for some reason.
They are missing here: https://learn.microsoft.com/en-us/uwp/api/windows.media.closedcaptioning.closedcaptionedgeeffect?view=winrt-22621
and here:
But they are present here: https://learn.microsoft.com/en-us/uwp/api/windows.media.core.timedtextstyle.outlinethickness?view=winrt-22621
Interestingly, not even DirectX seems to have support for text outlines. They all use the multi text trick that winUI 3 uses.
None of your pictures is showing outlines, did you get it working?
No, I was working on win2D rendering and was looking at the correspondence between TimedTextCue properties and win2D text rendering classes...
...and couldn't find outlines.
So then I googled in plain directx and found the 4 times rendering trick on an official MS page, so I guess that's that. Since XAML is directx based, they also don't support outlines xD
So basically:
- UWP subtitle implementation supports everything?
- winUI doesn't support image subs and poorly supports ASS/SSA
No, UWP and WinUI are using the same implementation.
There's no support for ASS/SSA anywhere. This is just added/implemented by us.
- Windows.Media implementation supports everything except outlines?
I'm not sure about outlines, but the effects (Raised, Depressed, DropShadow) must have existed long before. For two reasons:
So then I googled in plain directx and found the 4 times rendering trick on an official MS page, so I guess that's that. Since XAML is directx based, they also don't support outlines xD
There's no close relation between the two. Drawing outlines requires a certain API (I'm not sure which it was, maybe "DirectWrite"), same like what AegiSub and ASS is using on Windows, and this API isn't available to XAML. That's why they used the 4-trick. But the API should be available to a DirectX based implementation...
I might be mistaken, but I think the directx way to do it is to rasterize the text and then apply a pixel shader to draw the outline. As to why this isn't exposed to XAML, I wouldn't know at this point. Probably something to do with their protected rendering process that renders XAML in a sandbox.
I'd assume most existing applications use GDI to write text, which is more mature than the directx approaches. This should be available to winUI 3 through, so that's a good hint you gave me. I expect this is how libass and others are doing it. Do tell if I'm wrong.
Nobody tried this with UWP simply cause GDI is not allowed on UWP.
The problem is that only one of the text rendering APIs (DirectWrite iirc, dedinitely not GDI) allows you to access the outlines as vector shapes. Without having the vector shapes, it is impossible to draw proper outlines (not even with pixel shaders).
Well, writing our own implementation is probably not worth the time. I'd assume libass does it somehow, so I guess when we link this in, we will be able to render everything as bitmap subs as @lukasf originally envisioned.
Bitmap subs seem to be the only way left to bypass the windows settings when it comes to styling.
I did start working on a win2D based subtitle renderer. The bitmap subs are basically working. However, I only implemented in C#, to speed things up. Unfortunately, this causes performance problem a few minutes into playback.
The text based subs are somewhat more complicated and require more calculations. I guess they used a shortcut with StackPanels and XAML composition. Maybe the way forward is this, but I think the approach is inherently flawed and will cause performance issues due to GC objects in XAML tree, even if everything is written in C++
I did start working on a win2D based subtitle renderer. The bitmap subs are basically working. However, I only implemented in C#, to speed things up. Unfortunately, this causes performance problem a few minutes into playback.
I think the starting point for trying something in this direction is to
Finding out whether this is working at all is crucial to determine what options exist at all. (it might not be possible to display elements on top of a swap chain panel, so when testing, make sure that the canvas covers the media element just partially for comparison)
The frame server is fine and at least in my case, it is here to stay - it is the only way to have both video effects and gapless video playback.
I'll probably need to translate this to C++ and see how it works out, if it is a problem in my implementation or it really is the GC that's doing things. The only problem is C++ debugger being pretty terrible.
The last time I did this, I had CanvasAnimatedControl, which works like a swap chain. Unfortunately, not available in winUI 3.
Outline seems to be pretty easy to do in Win2d. And if Win2d can do it, then it should be possible as well using Direct2D APIs, because that's what Win2s is implemented with.
https://github.com/sonnemaf/Outline https://github.com/sonnemaf/Outline/blob/master/Outline/OutlinedTextBlock.cs
Nice find. Didn't think of using CanvasGeometry. Although I suspected something similar would be involved.
Great, this only makes things even more complicated xD
Cause you can technically apply the outline to only parts of the text line, so lines have to be potentially split into several geometries that need to be nicely stitched together...
Guess it will be done in 3 years.
Cause you can technically apply the outline to only parts of the text line, so lines have to be potentially split into several geometries that need to be nicely stitched together...
What do you mean by that? Why stitch?
Do you mean because of TimedTextSubformat ?
Yes, every single letter can have a different style, but not just outlines, this also applies to color, font style/family/size etc.
Using DirectWrite directly might be a better choice. I would just look it up how AegiSub and libass are doing it. No need to re-do it from scratch, and in the end it still wouldn't look the same (possibly).
There are also other implications you need to be aware of. The recipe for outline rendering is something like this:
You could simplify things a bit by constraining to the feature set of the ASS spec.
The relevant differences in this context: Outline color, Outline blur and outline width can only be set for whole text bocks, but not as inline styles.
Yeah it all sounds fun.
I will first deal with bitmap subs, those are the easiest.
Just wanted to check back: Has this ever worked for any of you so far?