ffmpeginteropx / FFmpegInteropX

FFmpeg decoding library for Windows 10 UWP and WinUI 3 Apps
Apache License 2.0
210 stars 53 forks source link

Embedded CC Support #30

Closed brabebhin closed 6 years ago

brabebhin commented 6 years ago

@lukasf Let's discuss about embedded CC support here.

1) At first glance it looks that indeed TimedMetadataStreamDescriptor looks a little bleak. Maybe there is still a SDK update to come.

2) The pull model can still work, probably we need to provide mock frame when there are no subtitles available.

3) As for a push model, maybe we can extract timed text from the file and feed it into existing, working API, such as TimedMetadataTrack.

lukasf commented 6 years ago

I just added automatic enable/disable of streams based on subtitle selection, so we only process those subs that are actually shown.

lukasf commented 6 years ago

Added language tag conversion, moved subtitle stuff from StreamInfo back into SampleProvider. Added auto selection of forced subtitles.

lukasf commented 6 years ago

One thing that I have noticed is a short flicker in subtitles. I assume it occurs when we add a subtitle cue while a different subtitle cue is currently shown. Do you see this as well? This one could be hard to fix.

We could try to wait for CueExited event, to add cues only when nothing is shown. But this makes things difficult due to multi threading. Plus there can be cases where subtitle cues overlap.

brabebhin commented 6 years ago

I did notice a slight sluggish performance, but i couldn't tell why. It is surely linked with the text cues though. I will have to try my app in release mode as well to see if things escalate.

brabebhin commented 6 years ago

I think we should keep the 3 letter code as well, in case we ever need to use it.

brabebhin commented 6 years ago

I wonder if the language would ever be in its 2 letter name...

lukasf commented 6 years ago

I managed to fix the flicker issue, it was not that hard in the end. It is really gone when we wait for current cue to be hidden. I do not see any performance issues, no matter if subs are shown or not.

If you want, you can add a "LanguageTag" string to the stream info classes, which contains the original, unmodified language tag from the stream. I would not name it three letter code because we cannot be sure it is always three letters (although it probably will be in 99% of files). What might be more useful is perhaps the parsed two letter code, because that is what Windows/UWP uses for all language stuff. So if you want to auto select streams based on user's windows settings, two letter code is what you need.

touseefbsb commented 6 years ago

auto selection on basis of users windows settings seems a very powerful tool, but we still must have an option to select a custom language, just in case?

brabebhin commented 6 years ago

@lukasf , nice work. :)

@touseefbsb , TimedMetadataTrack selection is done externally, either programatically, or via the transport controls. This is not the job of the library.

touseefbsb commented 6 years ago

@mcosmin222 what I wanted to ask actually is, after adding the auto selection into the library, ( supposing we r still using default mediaplayerelement ) will we still be able to change the subs language i.e TimeMetadataTrack externally with transport controls/programatically.?

Basically how this all will work together is that library will select a default language according to device settings and then user can change the language subs just like in a normal MediaPlayerElement ( using MediaPlaybackItem.TimedMetadataTrack) , is this how it will work?

brabebhin commented 6 years ago

The mandatory CC selection button will allow users to select the desired timed text track.

touseefbsb commented 6 years ago

thanks, that answers my question :)

brabebhin commented 6 years ago

You can plug this library into the default mediaplayerelement and you should be good to go.

lukasf commented 6 years ago

@touseefbsb Just to clarify: I don't think that we will implement auto track selection based on user language in the lib. The only thing I added was to auto select "Forced" subtitles, if there are any. These are specifically flagged subtitles that are intended to be always shown, usually to translate foreign language parts in a movie. Any advanced auto selection of normal subtitles must be done in the app.

You can enable and disable individual tracks in code on the MediaPlaybackItem using this method:

mediaPlaybackItem.TimedMetadataTracks.SetPresentationMode(, )

touseefbsb commented 6 years ago

@lukasf ahan so those forced subtitles are usually those which we normally cannot control with the transport controls and we cant add them with any srt stuff, they are just right there like they are part of video right? and the mediaPlaybackItem.TimedMetadataTracks.SetPresentationMode(, ) deals with embeeded CC subs ? so these are 2 different kind of subs independent of each other?

touseefbsb commented 6 years ago

and then there is 3rd type of subs known as External subs like files with .srt extensions, if I am correct then these external subs are controled with TimedTextSource , so are these external subtitles effected with the use of this library? or they still remain independent? the default behaviour of Movies and TV gives us a button to toggle between embeded cc ( in any ) and also gives us the option to select an external file ( within same button ) and if we select an external file the embeded cc ae overridden witht he external subs ( .srt ) so how will these work together if we use this library? will it work exactly the same way?

lukasf commented 6 years ago

No, forced subtitles can be enabled/disabled just like any other subtitle, they also appear in the transport controls. On the original DVD/BluRay, you usually cannot disable them through the menu, they are always shown. But for us, they are just a normal subtitle stream, but with a special flag. I use the flag to pre-select it, to kind of emulate what the BluRay does.

If you have an external subtitle file for a movie (e.g. a .srt), you can add that to TimedTextSource. Then it will appear in the transport controls like any other subtitle stream. It will also show up in TimedMetadataTracks (you need to listen to the changed event, they are added a bit delayed) and can be enabled/disabled like others. For the user it will look just like an embedded subtitle.

lukasf commented 6 years ago

The default transport controls do not support the selection of external subtitle files, so you have to implement that button yourself.

touseefbsb commented 6 years ago

so at the end , there is no difference between

  1. embeded cc
  2. forced subs
  3. external subs

whatever the scenario, only one of them will appear on the video at a given time ( depending upon the selection ) ? Ok so I can implement a button with TimedTextSource which can get an external subs file and add it to collection of TimedMetadataTracks, and then I can toggle between all the subs, with the built in subs toggle button?

the only confusion I have right now is what is the different between embeded subs and forced subs? I get that forced subs have flag to show them. if I understand correctly, then a video file can have multple embeded subs when we play it and some of them maybe forced and some might be normal, so if this library finds a forced sub it pre-selects it and shows it in the video? and then when the user presses the subs toggle button they see the list of the embeded subs ( with the forced sub already checked/selected ) and then they can change the subs as they like? but if they dont find the subs they want in that list, they can select an external srt file ( which will be implemented by the developer ) and then add that external sub will be added into the list of possible subs and can be selected from the same toggle list? Let me know if I got it wrong anywhere.

lukasf commented 6 years ago

This is correct. The only difference between forced subs and normal subs is that forced subs are now pre-selected by the lib. If there are no forced subs, then no subtitles are pre-selected. The user can always change subtitle selection, and they can also disable forced subs if they don't like them.

If you do not like pre-selection of forced subs, you can also switch this off in our config class. Then there is no difference at all between normal and forced subtitles. But I think it is a good idea to pre-select them.

touseefbsb commented 6 years ago

@lukasf Thanks that clarifies it and yeah I agree pre selection of forces subs seems like a good idea to me as well

brabebhin commented 6 years ago

@lukasf , I think it is easier to consume the API if the SubtitleStreamInfo class exposes a TimedMetadataTrack, especially if you plan on combining internal CC and external CC.

Ahh, I see that is quite difficult... But do we really need to create a new playback item each time? The way I see it the various media sources will all use the same base media stream sources, it is kind of redundant to do this. I think it should return the same playback item each time.

For example, if you want to separate the forced timed metadata tracks from the others, you have to carry around both the ffmpeg interop object and the media playback item.

lukasf commented 6 years ago

@mcosmin222 I did not yet get your point. If you want to add external subtitles, you have to add them through MediaPlaybackItem. And you can access all subtitle streams on the MediaPlaybackItem as well (through TimedMetadataTracks property). Now why do we need the TimedMetadataTrack on the SubtitleStreamInfo? I mean, feel free to add it there if you think it helps, I just did not understand your argument.

Currently I always create new instances of MediaPlaybackItem, because there are overloads which add start/end time. So if we'd call the method twice with different paramters, it would be wrong to return the same instance. One idea to solve this: Add a MediaPlaybackItem property on the FFmpegInterop class. Calling one of the CreateMediaPlaybackItem methods would fill that property, so after that you can access it through the FFmpegInterop instance. Calling the Create method again will throw an exception.

We could also add start/end time as optional parameters to CreateFromStream method, and use them when building the MediaPlaybackitem. But I do not like that so much, because the start/end time would only be effective when using MediaPlaybackItem, but not when using MediaStreamSource.

brabebhin commented 6 years ago

In my use case, I have custom media transport controls, which allows me to implement custom selection UI for the CC. Now, I want to give the user different options based on weather a TimedMetadataTrack is "forced" or not. In the current setup, through the MediaPlaybackItem, there is no way to tell which track is forced and which is not. In order to resolve the problem, you have to pair the MediaPlaybackItem to the FFmpegInteropMSS object, and you can figure out which TimedMetadataTrack is forced because it will have the same index as with the corresponding SubtitleStreamInfo.

i see your point about having to move the MediaPlaybackItem start time and duration in the ffmpeg interop config file. However, neither the current approach is good. If I create 2 different MediaSources from the same MediaStreamSource, I am not even sure what will happen, probably lead to some weird conflicts.

There should be a 1 to 1 link between the ffmpegInteropMSS, its MediaStreamSource and MediaPlaybackItem.

One idea to solve this: Add a MediaPlaybackItem property on the FFmpegInterop class. Calling one of the CreateMediaPlaybackItem methods would fill that property, so after that you can access it through the FFmpegInterop instance. Calling the Create method again will throw an exception.

This is a good idea. Sounds good.

touseefbsb commented 6 years ago

I agree with this idea as well, seems very generic and easy to use for all use cases.

One idea to solve this: Add a MediaPlaybackItem property on the FFmpegInterop class. Calling one of the CreateMediaPlaybackItem methods would fill that property, so after that you can access it through the FFmpegInterop instance. Calling the Create method again will throw an exception.

lukasf commented 6 years ago

Okay, looks like we have an agreement on the Create() handling.

@mcosmin222 That makes sense, feel free to add TimedMetadataTrack to SubtitleStreamInfo.

brabebhin commented 6 years ago

@lukasf OK, I will do the "Create()" handling while I am at it.

brabebhin commented 6 years ago

@lukasf done.

lukasf commented 6 years ago

@mcosmin222 Thanks!

brabebhin commented 6 years ago

OK. This looks pretty stable now. Anyone found any bugs yet?

Maybe we can move onto picture subs ^^

touseefbsb commented 6 years ago

@mcosmin222 picture subs?

brabebhin commented 6 years ago

Yes.

touseefbsb commented 6 years ago

actually I meant to ask, what are picture subs? :P @mcosmin222

lukasf commented 6 years ago

That's sub/idx, a bitmap based subtitle format, which is used on DVD/BluRay.

touseefbsb commented 6 years ago

oh so basically instead of text it will be image based, but only text on the image will be visible with transparent background hence it will appear like normal text?

lukasf commented 6 years ago

Yes. They usually look pretty ugly, because they only use a 4 bit palette, and they do not scale well. Still, it is one of the three big subtitle formats, and I'd see it as mandatory for a full-featured player.

brabebhin commented 6 years ago

I wonder if we could convert them to text based subtitles, using windows OCR API.

touseefbsb commented 6 years ago

wouldnt ocr api be expensive ?

brabebhin commented 6 years ago

That's why I wonder ^^

lukasf commented 6 years ago

If the OCR goes wrong, you can make things even worse. I have seen quite a lot of bad OCR'd subtitles, where some of the characters always get interpreted as a diffrent character. I don't know how this would be performance wise. I mean, we could try and provide this as an option, if performance is good. But it should always be possible to render the original, non-ocr'd subs.

lukasf commented 6 years ago

Thing is, windows OCR is probably optimized for normal text (black or dark color on white). I really don't know how well it would work on yellow/white text on transparent (or black) background. Subtitle creation programs use tweaked OCR and still they are not perfect.

brabebhin commented 6 years ago

I played around with OCR and it is unreliable. So I guess we are to move ahead with simply decoding the pictures to RBG.

Since there are only 4 bit pallette, perhaps we can optimize this and have a table map each possible color to a RBG color to improve performance.

lukasf commented 6 years ago

Sure, a lookup table is the best option.

lukasf commented 6 years ago

I think we should merge the PR before working on bitmap subs. I fixed a NullReferenceException a few days back, but besides that I did not notice any other problems. Any other findings or are we good to merge?

brabebhin commented 6 years ago

We are good to merge. I haven't been able to do anything regarding bitmap subs due to various other non-related issues lately.

lukasf commented 6 years ago

Merged the current PR. I've also been rather busy lately. Maybe I will have some time next week for bitmap subs, let's see...

brabebhin commented 6 years ago

I think there is a timing issue with the anti flickering solution. From what I can tell, we can miss some cues if the active cue overlaps their duration. For example, if 2 characters talk at the same time, and each of them is a different cue, we will miss the 2nd cue.

I wonder if it were possible to decode only the cue streams in advance, maybe on different thread, using a different ffmpeg reader+IRandomAccessStream?

lukasf commented 6 years ago

Have you actually observed this? Because the anti flicker code is supposed to directly add a cue if it overlaps with the active cue. In that case, a flicker can occur, but the cue is shown in time. Only if the cues do not overlap, then new cue is added to the pending list and shown once the active cue is hidden.

If cues are really lost, we have to investigate why the anti flicker solution does not correctly detect the overlap.

brabebhin commented 6 years ago

Some cues do seem to get lost. But i only observed that in one file so far. I was wondering if you noticed something similar. I haven't thought of a way to confirm that yet though.

brabebhin commented 6 years ago

Ahh i see now. There is a race between the mutex lock in addCue and the mutex lock in CueExit event handler. There is a chance that addCue wins the race before the CueExit event handler, thus causing us to miss cues. I think there should be a priority lock acquisition for the event handler to add the pending cues if there are pending cues.