ffmpeginteropx / FFmpegInteropX

FFmpeg decoding library for Windows 10 UWP and WinUI 3 Apps
Apache License 2.0
205 stars 52 forks source link

ASS Subtitle Fixes #400

Closed softworkz closed 7 months ago

softworkz commented 8 months ago

SubtitleProviderSsaAss: Add line break SubtitleProviderSsaAss: Calc font size SubtitleProviderSsaAss: Print offending tag in case of parsing errors SubtitleProviderSsaAss: Prevent parsing errors due to ambiguous tags SubtitleProvider: Fix for color strings like H&H2A4F5D& SubtitleProviderSsaAss: Shadow is float

Fixes #397

lukasf commented 8 months ago

Well done, thanks @softworkz!

Only we need to be very careful when changing the font sizing. We had different algorithms already, and some turned out to have weird effects on some files. We'd need to do lots of testing with different files, to make sure that sizing is consistent and in line with how most other players render the subs.

softworkz commented 8 months ago

Well done, thanks @softworkz!

Only we need to be very careful when changing the font sizing. We had different algorithms already, and some turned out to have weird effects on some files. We'd need to do lots of testing with different files, to make sure that sizing is consistent and in line with how most other players render the subs.

The problem was that after fixing the named style parsing (shadow), in many cases it started to really pick up the those styles (which it didn't before) and the result was that fonts were too big. Otherwise I wouldn't have even bothered touching it.

A little more background:

You can rely on the PlayResX and PlayResY values to be present in all ASS headers (all ass versions) as those are mandatory and crucial. These are frequently differing from the resolution of the video, because (naturally) subtitles are usually created just once and then used with different versions of a video. What complicates this even more, is that the aspect ratio of the PlayRes might not match the aspect ratio of the video, and - even worse - the video might be anamorphic so that we must not only take the ratio between videoWidth and videoHeight but also the display aspect ratio (DAR) in case of anamorphic videos. The correct procedure would be to calculate the display dimensions of the video and fit a rectangle into it with the aspect ratio of the PlayRes dimensions. Then we can calculate the ratio between PlayResX/fittedDisplayRectWidth and PlayResY/fittedDisplayRectHeight. From these two values, we need to take the smaller one and that would be our scale factor for fonts and all other dimensions. Depending on how it fits (height or width equal) there's either a vertical or a horizontal offset remaining which needs to be added when positioninig.

Since that is too much for a series of small patches, I only used the heights to determine a scale factor for the font sizes. It's not perfect, but on the side of the most frequent cases at least (where the height factor is determining the scale).

So far so good, but there's one more caveat: There may be differences regarding the meaning of a font size of n pixels between Windows and ASS. And this can even differ by font. Getting that part right is really tough. Everything else can be calculated..

softworkz commented 8 months ago

Another note: The style parsing is not quite correct, because neither order nor count of fields is guaranteed, i.e. there can be more or less than 23 columns. You need to read the line with the field names first and then parse the values in each line accordingly. What we are seeing and assuming to be a standard is just how Aegisub - the most popular ASS authoring application - is formatting the styles. From a practical perspective - there's at least one diffference that actually exists out in the wild. I don't remember exactly which one it was.

lukasf commented 8 months ago

Another note: The style parsing is not quite correct, because neither order nor count of fields is guaranteed, i.e. there can be more or less than 23 columns.

There is an official standard document for the SSA/ASS styles, for V4 and V4+. The version number defines which fields there are and their order. It is nowhere said that the order may be changed.

Doing dynamic parsing would totally blow up complexity here, just to catch some very few broken files, produced by broken non-standard compliant tools. I don't believe it's worth the hassle. And it's not like they won't work, it's just that their styles are not applied.

softworkz commented 8 months ago

There are many bits which are missing - many of them cannot even be replicated (as discussed recently), but what's important and doable is collission detection, which means that when two or more cues are visible at the same time (and not absolutely positioned), they need to be stacked in order not to overlap like in this example:

image

Correct it looks like this:

image

Another issue: There's a discrepancy in color. It's slight but noticeable and in case you wonder how I can say that the second is right and the first one is wrong, it's because I've watched that video a thousand times with different ASS implementations and in several subtitle authoring tools and it's always like in the second image. Yet I haven't looked into it, as to whether it's an issue with parsing of the colors or caused by Windows.Media.

softworkz commented 8 months ago

What's further unfortunate (but probably nothing we can do about) is that Windows.Media uses a square-shaped pen for drawing the outlines which is causing ugly artifacts:

Windows.Media

image

Correct

image

(I'm not even talking about the edge blur and the shadow 😆 )

lukasf commented 8 months ago

Huh, usually MediaPlayer does the stacking, I have seen it happening quite often. Actually, I have never seen subtitles overlap like that. Are you sure they are not positioned absolutely? They seem to be on slightly different Y positions. Or maybe this is one more WinUI thing?

I also noticed the slight color deviation when I started on SSA/ASS styles support. Not sure where it comes from. But the difference was low enough for me to not bother 😄 If you want to dig into this, you could enter the parsed RGB value into a graphics tool, to see how the color is supposed to look like.

And yeah, the outline rendering is a mess. Better not look too closely ^^

softworkz commented 8 months ago

There is an official standard document for the SSA/ASS styles, for V4 and V4+. The version number defines which fields there are and their order. It is nowhere said that the order may be changed.

Oh yes it does say that very clearly:

image

softworkz commented 8 months ago

Regarding ASS specidfication, I had some chat with the core developers of libass, because it is not complete in several means, and they said that the full spec they consider to be valid is the combination of this document together with the source code of AegiSub... 😉

lukasf commented 8 months ago

I read it more in the sense of "you can do it dynamically to support new standards without having to rework". But there has never been a new standard after 4+. And 4+ defines exactly which new fields are added at which exact position (the ones in red).

And yes, Aegisub is the de-facto standard. It shows how the files are supposed to be. I'd see any other files as broken.

lukasf commented 8 months ago

I also read somewhere that Aegisub is considered the more precise definition of the standard. The document just tries to make it more readable.

softworkz commented 8 months ago

Huh, usually MediaPlayer does the stacking, I have seen it happening quite often. Actually, I have never seen subtitles overlap like that. Are you sure they are not positioned absolutely? They seem to be on slightly different Y positions.

The different Y is probably due to being Asian letters. There's no absolute positioning. Just different styles:

image

softworkz commented 8 months ago

The text lines are probably better to look at:

Dialogue: 0,0:00:43.35,0:00:53.61,OP Romaji,,0000,0000,0000,,{\fad(250,250)\be1}{\kf38}zu{\kf31}t{\kf85}to {\kf174}shi{\kf54}ma{\kf48}t{\kf162}te {\kf91}o{\kf322}ku
Dialogue: 0,0:00:43.35,0:00:53.61,OP Kanji,,0000,0000,0000,,{\fad(250,250)\be1}{\kf38}ず{\kf31}っ{\kf85}と{\kf174}し{\kf54}ま{\kf48}っ{\kf162}て{\kf91}お{\kf322}く
Dialogue: 0,0:00:43.35,0:00:53.61,OP English,,0000,0000,0000,,{\fad(250,250)\be1}I'll keep it to myself forever.
softworkz commented 8 months ago

I read it more in the sense of "you can do it dynamically to support new standards without having to rework". But there has never been a new standard after 4+. And 4+ defines exactly which new fields are added at which exact position (the ones in red).

And yes, Aegisub is the de-facto standard. It shows how the files are supposed to be. I'd see any other files as broken.

They are working on 5.0, but who knows whether and when it will happen 😆

"you can do it dynamically to support new standards without having to rework"

Which means that current implementation should not rely on a fixed order and count so that they won't get broken by later versions of the spec.

But as said, there's one case where existing ASS files differ and that made me change the whole parsing in the ffmpeg implementation I did to use the field names for parsing.

softworkz commented 8 months ago

I don't know how the WindowsMedia layouting is working - could the overlap happen because both styles have margins (the same MarginV) and the auto-stacking does perhaps not work when you use the margins to set the region?

lukasf commented 8 months ago

I don't know how the WindowsMedia layouting is working - could the overlap happen because both styles have margins (the same MarginV) and the auto-stacking does perhaps not work when you use the margins to set the region?

That could be the case.

They are working on 5.0, but who knows whether and when it will happen 😆

If they ever release a new standard, I'd support it like I did for V4 and V4+. Sure, dynamic parsing would be the better implementation. But it is way more complicated, and for me it's not worth all the effort, just to support a few broken files.

If it is very important for you, and you want to implement this once again, feel free to add it. As I said, I'd consider it the better implementation, it's just that I won't do it 😄.

softworkz commented 8 months ago

If it is very important for you, and you want to implement this once again, feel free to add it. As I said, I'd consider it the better implementation, it's just that I won't do it 😄.

Agreed and I understand that very well 😆

BTW, I just looked up what they consider to be the size of a font when it's set to 100% and in the source it is stated that a font size of 100% corresponds to 5% of the video's height...

softworkz commented 8 months ago

it's just that I won't do it 😄.

Damn - and I just wanted to suggest that I could send you a bunch of videos with ASS subs for testing, LOL...

softworkz commented 8 months ago

OMG - do you know how they're drawing the outlines? They are drawing the text 8 times!!! (N, NE, E, SE, S, SW, W, NW)

brabebhin commented 8 months ago

That's how age of empires 2 text used to work. I can test your ASS files if you want to proceed with the implementation.

softworkz commented 8 months ago

That's how age of empires 2 text used to work. I can test your ASS files if you want to proceed with the implementation.

Thanks. I will check back on how much time we want to invest here. The collission issue is obviously more important than the style parsing.

softworkz commented 8 months ago

OMG - do you know how they're drawing the outlines? They are drawing the text 8 times!!! (N, NE, E, SE, S, SW, W, NW)

And they even to it wrong:

    // Upper left
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, -offset, -offset));

    // Upper center
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, 0, -offset));

    // Upper right
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, offset, -offset));

    // Left
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, -offset, 0));

    // Right
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, offset, 0));

    // Lower left
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, -offset, offset));

    // Lower center
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, 0, offset));

    // Lower right
    IFC(CreateOutlineHelper(pLine, pStyle, pRegion, pCueElement, offset, offset));

Do you see what I mean?

softworkz commented 8 months ago

I have submitted a PR to improve the outline rendering: at https://github.com/microsoft/microsoft-ui-xaml/ PR nr 9224

softworkz commented 8 months ago

That's how age of empires 2 text used to work. I can test your ASS files if you want to proceed with the implementation.

It would be great when you could test a bit if you find the time. I have created that draft PR which includes all the fixes I have submitted: https://github.com/ffmpeginteropx/FFmpegInteropX/pull/405

Now I can of course send you some of my files for testing if you wish, but as I know that these are working (except animations), it would be much more interesting to have it tested with more and different files..

lukasf commented 8 months ago

Wow you are really serious about this @softworkz. Good job. I never had the ambition to fully support ASS. For me, it was enough to have some of the basic styling working. But you are really taking things to a different level. I will try to do reviews on the partial PRs. But we should do testing on the combined PR, and merge that when we think it is all good.

The bad thing is that we are limited by the poor rendering done by the framework controls. Nice catch with the broken outline rendering. I's no surprise that it looks so horrible. Not sure how libass does it, but I'd guess it's more a combination of a single render plus a clever combination of effects to get the nice outline. But of course, increasing the render count (combined with knowing how a circle works 😄) is a major improvement to the MS solution.

Are you actually able to compile WinUI 3 from source? Or how have you done the comparison screens in the PR?

brabebhin commented 8 months ago

From the files I've had in my usual testing battery this looks fine. Good work! I will try to create some brand new subtitles and mix them around to see how it goes.

brabebhin commented 8 months ago

@lukasf I've read somewhere that libass uses direct2D/directtext for this outline thing. It's how age of empires 2 Definitive Edition does it 😂

softworkz commented 8 months ago

Wow you are really serious about this @softworkz. Good job. I never had the ambition to fully support ASS. For me, it was enough to have some of the basic styling working. But you are really taking things to a different level

I've been shaken back and forth because I know our users and how they would have reacted, so we had discussed internally for how we want to proceed. My initial suggestion was to disable ASS subttitles and use your implementation only for all other text subtitles for which it is doing very well. The downside of this is that users prefer to be able to DirectPlay (no server transcoding/transmuxing), so they wouldn't like missing ASS support either. Many of those are watching Anime videos with ASS subs, which often have animations included and using advanced ASS features and almost always custom fonts. With the current state it would have been forseeable that I'll regularly get user reports on the table with certain files (ASS subs) not being shown properly and I want to avoid by all means that this would become a regular area of work for me in the future 😉, So we decided to make this (ASS rendering) optional and call it "experimental". Nonetheless, I invested those 3 days to make it as good as possible under the given constraints by WindowsMedia.

The big plus which made me stick to your implementation was that you had already implemented support for custom (embedded) fonts - which is always the first thing about which Anime fans are complaining. 👍

softworkz commented 8 months ago

Are you actually able to compile WinUI 3 from source? Or how have you done the comparison screens in the PR?

No I can't compile - I didn't even retry with the 1.5 branch. Instead I have created a UWP project which replicates their current method and made sure that it gives identical results to their current implementation. Then I implemented my proposed solution, did the comparisons and finally translated it back to C++ code. If you are interested, I can send you that project. It's very simple.

softworkz commented 8 months ago

@lukasf I've read somewhere that libass uses direct2D/directtext for this outline thing. It's how age of empires 2 Definitive Edition does it 😂

They use DirectWrite on Windows, but I don't know how they do the outlines.

lukasf commented 8 months ago

They use DirectWrite on Windows, but I don't know how they do the outlines.

Based on the output, I am pretty sure they use some kind of gaussian blur on the text for the outline, plus maybe some additional effects for fine tuning the look. See how it fades out smoothly to the edges. You cannot get such a smooth fade-out by just rendering the text multiple times. Same is probably done for the shadow, just with some offset and different color/alpha.

At least, I used blur effect to create text shadow in an app, very long ago. It had kind of a similar look.

softworkz commented 8 months ago

They use DirectWrite on Windows, but I don't know how they do the outlines.

Based on the output, I am pretty sure they use some kind of gaussian blur on the text for the outline, plus maybe some additional effects for fine tuning the look. See how it fades out smoothly to the edges. You cannot get such a smooth fade-out by just rendering the text multiple times. Same is probably done for the shadow, just with some offset and different color/alpha.

At least, I used blur effect to create text shadow in an app, very long ago. It had kind of a similar look.

I made an extreme test in Aegisub with a real large outline:

image

This reveals that they are converting the font to a path and drawing it using a circle-shaped pen.

softworkz commented 8 months ago

For the shadow, they are taking the alpha mask of the rendered text (including outline) and painting it with an offset and with some opacity:

image

The edge blur is simply a gaussian blur applied to the alpha mask:

image

softworkz commented 8 months ago

The edge blur is simply a gaussian blur applied to the alpha mask:

Oh, no - not quite:

image

It's a bit more sophisticated. It's a blur which uses the sorrounding as input but output is only inside the actual region, so it cannot grow larger than it is.

(the image is done with a high blur value but no opacity applied to any color)

lukasf commented 8 months ago

Interesting, thanks for the analysis.

lukasf commented 8 months ago

Yeah I forgot that outline and blur are separate parameters in ASS. It's just that UWP/WinUI does not support blur.