Closed lukasf closed 5 years ago
I managed to cut out beginning of the files.
First file: https://1drv.ms/u/s!Asy3bMoFN_qNh-ZtW2pa4e6KSf9yyg
Second file: https://1drv.ms/u/s!Asy3bMoFN_qNh-Zu8L_uPT1dKL9DvQ
I tested the files with 3.4.2 branch, same result. So it is not related to ffmpeg 4.0 or to external subtitle changes.
I done some changes to fix positioning on second file. Still need to fix wrong aspect, which is the more difficult part.
Hi,
These sound like really difficult problems. I'm not even sure how we could tackle the 2nd one. Do any of the Microsoft players manage to display these files properly?
EDIT: tomb raider seems to be broken apparently. Edit2: I think I have some files with bitmap subs as well. I will look into it.
For problem nr 2, I think we never release these image cues. I think we should remove them from the timed metadata track after the CueExit event. I suppose we could do the same with the text cues as well. They very much are useless after they've been seen. I guess we have to incur the overhead of processing them again if user seeks back. But that shouldn't be as bad as running OOM.
Which is the file where layout doesn't work? For me, the tomb raider file is always a blank screen. The iron man thing displays correctly. Both give 0 when trying to view image subs.
Tomb Raider is 4K HEVC. You must disable HW acceleration for that, if you don't have a very recent CPU.
The original Tomb Raider file shows subtitles. I then used MkvToolNix to split into chunks and MkvToolNix will remux the file during that operation. The output file does not render subs anymore, so now after remux it has the same problem as first file. I will see if I can maybe binary split, to get a small file with working subs.
The solution you propose would indeed work, but only for embedded subs. External subs would still load all image cues and keep them in memory. I don't know if it maybe would be possible to hook into CueEntered and create the WritableBitmap there. We could keep the decoded AVSubtitle in memory. It is only a small, pallette based image (and only as large as actual text area). That would be the most elegant solution. But maybe it does not work and bitmap must already be there when CueEntered comes. Even if it works, it's complicated.
Last option: From SampleRequested we roughly know the playback time. We could propagate that to subtitle providers and create Cues/Images on demand. But that would be the most difficult option.
The problem I see with creating the images in the CueEntered event is that we may not have enough time to generate the image before the Cue exits.
Regarding the playback time, I think that the sample request is a bit ahead of the actual playback position, but that might just work. We will have enough time to do the decoding. I don't see any other solution.
But it is kind of brain wrapping when we bring external subs into the mix. We would somehow have to integrate the subtitle providers from the external sub file into the main ffmpeg interop instance. If we do this, then my solution would work for them as well.
The advantage of the last option is that it can work for both external and embedded subs. If we export the external sub as a collection of AvSub, then there wouldn't be any difference between them.
We could also do the last option for external files only, and keep trimming the count for internal subs, this would be the most efficient way of doing it (since for embedded ones we have the file itself to keep the subtitles).
Embedded subs get their packages delivered "automatically" in sync, while the other streams are played. For external subs, we'd need to manually read packets, somehow synced to playback position of main instance. Or read all packets and then delay-create bitmaps, on demand. Both options are rather complicated.
There is one other optimization that could be done. Currently, we always create a full screen bitmap and render into that, even if the actual text bitmap is only a small part of the screen. We could instead create a bitmap only of the actual text size and calculate ImageCue.Position and ImageCue.Extent to position it correctly on screen. It would probably save 80-90% of memory. If the other options (delay create) do not work for external subs or are too complicated, then this could be a way to at least reduce memory pressure from external subs.
Sounds like a plan. And together with trimming the size for embedded subs, it should reduce the memory usage significantly. I can push the code changes for trimming useless subs this evening.
I worked on sizing stuff yesterday already, to fix the other sizing issues I saw in that file. To prevent merge conflicts, it would perhaps be better if you try the "remove embedded subs on CueExited" changes?
Or maybe this is already what you intended to do...
Yes. That is what i intended to do. But I am at a doctor appointment now so I can't push the code. I tried something in the morning and seems to work fine.
Ok cool!
You can push your code if you want, i can deal with the merge.
I will push later this evening, needs some more work.
I will wait for your push.
Oh man. Did I mention that there is also a flicker problem on the TombRaider file? Looks like PGS subtitles can have infinite duration, they are cleared when the next sub comes. So we only know the real duration when next packet is received. I tried to fix this, but the fix seems to fail when the sub is already shown. Duration cannot be changed once shown. This is going to require a very dirty hack solution.
I have currently done this on a separate branch. Do you think we should fix these things on the external subs branch+pr? Or do it separately?
Here is a link to a new TombRaider sample, this one works. Please note that the seek bar shows 2h duration, but you can only seek into first 1-2 minutes. Subs only show up at about 1 minute.
Currently my fixes are in branch fix-subtitle-sizing. I can merge to external subs branch. Maybe it is better, because some things apply to both embedded and external subs.
Current fixes:
I don't see a problem with doing it on the external sub branch. We already modified stuff there, and the same files are affected. working on it will reduce the chance of conflicts.
I changed the name of the PR so it is a more general one :)
I think flickering is a lower priority fix. :) so good work on the sizing code.
I tested it. I am not sure if this is normal...
Try to disable HEVC passthrough
Tomorrow I will try to add the dynamic sizing of subtitle images.
Im not talking about the blank screen, but about the 2 subs on top of each other ^^
I guess this happens because you don't have a video image, so UWP does not really know where to put the subs. For me, all subs position on bottom of video frame.
Yesterday I was a bit under the weather. I will post my code today in the evening.
I have managed to get partial subtitle images working. Bitmaps are now usually between 5-10% of original size. This should have major impact on memory consumption.
I have tested two different workarounds to the duration issue. One makes subs flicker (add next cue when it comes, remove previous ones on CueEntered of next cue). The other makes subs disappear too early (remove active cues whenever new cues are added - parsing happens ahead of time, so we remove too early). Both are not really acceptable. Second workaround is active in code right now.
I have imagined a (pretty nasty) workaround: We add a second TimedMetadataTrack with TimedMetadataKind Custom or Data. We enable it, so it will raise CueEntered events, allowing us to sync up with playback position. But it won't appear in the subtitle list. We now add an entry into that second track for every new cue we parse. Whenever we get CueEntered, we remove all active cues from the real subtitle stream and add any pending cues. So in a way it is similar to the old anti-flicker solution. But since we cannot use the real track due to duration issues, we use a second, hidden track to do the syncing.
One other possible workaround would be to read packets ahead of stream, until we get the next subtitle cue (we need some limitation of course). This way, we can get the real duration for a cue and fix it, before showing it. Downside is that this would fill up packet queues for av streams, depending on the limit we put. For an average HD movie it would be about 1MB/s, for a very high bitrate file (4K) it could be 5MB/s. If we'd read the next 5 seconds, it would be 5-25MB of additional RAM usage. We could enable this only when required, so for normal (text based) subtitles, memory would stay same. If we don't find the cue in the next 5 seconds, we keep infinite duration and then remove it once next cue is parsed (which could be a bit too early, but it is shown at least 5 seconds so it is not such a big problem).
Both workarounds are quite a lot of work, both have their downsides. Not sure what is the best approach here. If you have a different idea, let me know.
What if we use the same cue content several times until we can find its next cue? So for example we have cue A with some random image, then we put cue A with a default duration of our own choosing (lets say 0.5 seconds), and then if we haven't found its next cue by that time, we add another cue B with the same image as A and so on. We could only have 2 for large periods of time and we remove one and change its position.
I am pretty sure that this would also cause flicker.
We cannot add cues without causing flicker.
Even if we add cue B when cue A triggers the exit event? it would be like what we have right now. But with the same cue, basically.
We don't see the flicker if it happens when going from one cue to a different one. But if cue A disappears shortly and then the same cue A re-appears, it will definitely be a visible flicker.
hmm you may be right. Another idea, this one is even crazier. What if we clone the random access stream and we create a new ffmpeg reader which will deal specifically with this subtitle provider? This way we can read ahead without filling the queues of the other providers.
Or treat the new stream as an external subtitle stream and use the code to fetch the entire track out.
You cannot clone it and read from different positions of the same stream. You could only open a second stream for the file, but we'd need that as second stream input parameter. Not nice. Also, in that case it would be better to just use one and read ahead like in my other proposal. Then at least, the parsed packets will be used a few seconds later, instead of throwing them away and then parsing out the same packets from the other stream for real decoding.
Or treat the new stream as an external subtitle stream and use the code to fetch the entire track out.
That would take a whole lot of time before playback starts. Parsing through 10GB for a HD movie before even starting... No way.
Told you it was crazy :)
Anyway, I think there is no other way of syncing playback without incorporating the media player / media element in our library. If we do incorporate it, we would need to pool it with a timer, which is also not really nice.
The Media player class used to have a nice property for defining playback markers on the timeline and you'd get an even. But apparently is deprecated in anniversary update. But maybe it could still work for getting events at specific positions.
But then again, I don't think something else is possible.
Well, custom TimedMetadataTracks provide that exact functionality. There are also samples for that. The purpose is to easily sync activities in the app to playback position. Maybe this is what we need to use.
It is possible to define custom Cue classes with additional properties, for use with custom TimedMetadataTracks. That way, we could directly attach the corresponding "real" cue to our custom cue, so we know which cue to add to the subtitle stream when the custom cue enters (and remove all previous cues).
Yes, all cues seem to hint at using this custom TimedMetadataTrack with custom implementation of IMediaCue...
I never actually though of implement IMediaCue and using it that way. But then again, I didn't really need it, since all that could be done that way could also be done using a timer in the app (the app has access to the MediaPlayer object, so pooling it isn't an issue).
Holy crap. Now I found out why the subs don't work on that Thor file. Mkv format supports stream compression using ZLIB or BZLIP. Both have to be linked into ffmpeg to make this work. Otherwise ffmpeg will just return the compressed data, which subtitle parsers won't understand. Files which use stream compression for subtitles will cause problems.
I feel like we opened pandora's box.
Oh snap. This is not good. Maybe winRT supports these out of the box? We should look over compression options, maybe it isn't necessary to import 3rd party libs.
Ok, it doesn't look too good. the winRT APIs are quite weak. I suppose each individual package is compressed, right? I mean there is no shared compression tree between packages? If so, we could include those libraries directly in our winRT project, and avoid linking them directly into ffmpeg... this would of course mean we will have to handle the compression in code, but this should make the build process easier.
I think it is better to link them into ffmpeg. There could be more places where the libs are required. FFmpeg will disable code when the libs are not available.
ShiftMediaProject has those libs with VS build files, including WinRT targets (only ARM/ARM64 missing). It should be possible to include them as git submodule, then add them to our build files. FFmpeg should automatically find and use the libs, when they are copied to a specified lib folder. It will be interesting to see if this works, since other useful libs could be included as well.
True. Like the one handling the ascii & utf8 situation.
Good point.
Yay!! After fiddling around a few hours, I finally managed to compile ffmpeg with zlib included! I now have build scripts which will automatically build zlib and copy it to the appropriate folders. And indeed this solves the issue with subtitles not appearing for the "Thor" movie.
I tried the same for bzlib, but it currently does not work, no clue why. FFmpeg says bzlib not found, although it is in same folder. I will try a bit more...
zlib is also used by other codecs. I think we should include it (and also bzlib and iconv, maybe openssh?) as a git submodule and include them into build scripts. Using git submodule has the benefit that we can update/merge them from source repos.
Cool. Maybe bzlib is not named the way ffmpeg expects it?
So the zlib dll needs to be distributed along the ffmpeg dlls, right?
I got both bzip and zlib running now. There is no documentation on which lib name ffmpeg expects. You only find out by digging deep into build log files.
Unfortunately, I cannot get iconv running. FFmpeg will check all external libraries by compiling a short test app with one or more test method calls (pretty strange concept IMHO, this is also the reason why it takes so long before actual compilation starts). If compilation fails, lib is not used. This build reports unresolved external symbol GetACP (get current code page function), although it should be available for store apps, and all Windows libs are referenced. I use GetACP in FFmpegInterop as well...
While testing external subtitles, I noticed a few problems with normal embedded bitmap subtitles.
I have one file where embedded PGS subtitles do not work. Decode function gives no error but got_subtitle is always 0. The same file works well with other ffmpeg based players, including ffplay. I debugged into ffmpeg but cannot find out what is wrong. Looks like the packets contain invalid data, not sure how that can happen, since other ffmpeg players play fine. The funny thing is, if I extract the PGS stream and load it as external subtitle, it plays fine. It is actually one of the test files I used for external subs. Only embedded into the movie it does not decode properly.
Then I have one other file where PGS subtitles work, but have problems. The file is 4K, but subs are 1080p (probably from normal BluRay). Multiple problems with this one: Subs show, but moved to top-left because of wrong resolution. Also, subs have wrong aspect because they are 16:9 while the 4K video stream is not 16:9. Additionally, subs have blue tint in the background, which should be transparent. Then I also noticed that after some time, app crashes with OutOfMemoryException. Creating and keeping lots of bitmap subs in 4K resolution does not look like a good idea. I had a bad feeling about keeping all those WritableBitmaps in memory, but now we really see how it breaks. It is stupid that ImageCues do not have a mechanism to delay load the actual bitmap. To tackle that one, we'd probably have to track playback position and create and assign subtitle bitmaps to cues on-the-fly (and remove them after being shown). Difficult one! Oh man.
Unfortunately, these are huge files. The first one is 13GB, the second one 25.