Closed brabebhin closed 3 years ago
Cool man, nice work!
Hi @lukasf
Is it just me or the memory consumption of our software decoding keeps getting up and only up? With or without filters.
Yes I see the same. But it only happens on this branch. I will check the changeset, maybe I can see something.
Found and fixed the leak. A new scaler was created for every frame. Using sws_getCachecContext instead of sws_getContext now. It does re-use an existing context if possible, otherwise it will free the existing and create a new one.
Nice catch!
Interesting function you found there... Too bad there's not a similar one for swr.
Anyway, I've did some testing, the average laptop should not be able to handle this. I will proceed to try and see what can be done about hardware support...
I wonder if openCL can be used with winRT, it seems to support filtering
OpenCL is not used in video decoding. Check "FFmpeg API Implementation Status" on that page. It will not help us here.
I think we should disable video effects for D3D11 until it is officially supported. It fails on avfilter_graph_config. We should override UpdateFilter() and do nothing in there.
Alternatively, we'd have to roll our own implementation of copying to CPU memory and feeding into the filter chain. I don't know how difficult that is. Unfortunately, I don't have any D3D11 skills.
I meant for filtering only.
I think the main problem is that decoding is so heavy on the CPU if not offloaded. Also, I'd guess that only part of filters would use OpenCL. Sure it might help in some cases, but my guess is that we should focus on getting filters work with HW decoding first.
I found this API: av_hwframe_transfer_data
It should allow us to easily copy HW to SW frame, and then push the SW frame through the filter pipeline. But first we need to find out why avfilter_graph_config fails. Maybe we need to patch the pixel format in our video context to SW format before that call, and patch back after it.
I will take a look at this over the next week. I think we need to init the graph after the first frame has been copied from the hw context.
It almost works. When I last tried that api, it would take a solid 100ms to copy the hw frame to the sw frame. Somehow, now it takes less than 30ms, which is acceptable from a performance perspective, it allows a solid fps
On low res files it works decently well, but I created another memory leak, and I need to fix it first, that should allow my 4k file to also work: memory quick skyrockets to 3-4 GB.
Well, finally, I figured the problem out. Performance is still nothing to write home about on anything above 1080p. The filtering part takes 80-90ms to complete. But on lower resolution it works decently well.
Performance with HW decoder is way better than without. I had a similar solution, but was stuck at a very bad memory leak. Your solution seems to work fine.
Up to FullHD, filters work very well on my PC. But with 4K video, things fall apart. I get ~35ms for transfer from GPU to CPU plus ~140ms filter time on 4K. That's way too slow. The main issue is not the transfer to CPU but the actual filtering on the CPU. Most filters are single threaded and when you have multiple filters, they will be executed sequentially, making things even worse.
So as far as possible, filtering should be done using Win2D effects. They are blazing fast on the GPU. Still, it's good to have FFmpeg filters in place, for stuff that is not possible with Win2D.
Oh I had an ugly memory leak as well. Took me a few hours to get it right, pointers and passing by value are so annoying sometimes >.<
I am still trying to get MediaPlaybackList to work with any sort of video effect at all, but this seems like a pretty lost cause at the moment, seeing how frame server mode also seems to have an issue.
Just found out that most of the time is actually spent in format conversions. The colorchannelmixer filter only takes RGB formats. So on my 4K HDR file, format is changed to RGB48, which takes about 110ms. The actual filter then only takes 30ms. After we have the filtered frame, our scaler comes in to convert this to NV12, which takes a whooping 350ms. So in total, we have about 500ms per frame, and only 30ms is the actual filtering. Normal (non-hdr) use RGB24 and have similar proportions (FullHD file: ~50ms conversion, 5ms filtering).
Could we cut down on the conversations?
Yes we can! I could manage to force use of the bgra format in the filter chain, and also use that as output format of our video sample provider. This allowed me to omit the sws scaler call, which is the most expensive one. Playback of my 4K file is at least twice as fast now! It is still way too slow for real time playback, but performance is noticeably better.
The problem is that we don't really know which formats are supported by the filters we use. I guess that most filters will work fine with bgra. But some filters might change it to something else. And some effects are easier to calculate on YUV planes than on rgb, so these might lose performance. If you know which filters you use, you can optimize things. Getting rid of the sws scaler call sure helps a lot. But there is no easy way for us to optimize for all filters.
I think we should also add a way to put filters right in the config class. So they are known and used right from the start.
But i think we wouldn't be able to turn video effects on or off on demand like we do for audio, if we configure the entire stream source around it.
I also explored the possibly of using win2D with the directx decoder. But win2D wants brga output as opposed to the nv12 we output.
The way I envisioned it was an injectable interface and users would implement the win2D filter chain outside of the library, so they get more power to customize stuff. We could go that way. Win2D had quite an extensive filter library.
I cleaned up the code, added dynamic scaler selection, polling loop for effects like minterpolate (where 1 input frame might create multiple output frames), plus possibility to specify filters in config class. See CS sample ctor: I force BGRA output in the filter chain and in our config. But it only helps for some filters. If you use this with minterpolate for example, this filter will convert the bgra back to a yuv format for its processing, and then our scaler will at the end convert again from yuv to bgra. That is why I do not automatically select an output format if video effects are used. There is nothing that helps for all filters. But devs can optimize the configuration, if they know what filters they use upfront.
Things still work with calling SetVideoEffects at any time. It might just lead to scaler calls when you change things during playback.
I read that later Windows versions support dynamic change of video formats. Maybe we should explore that, to prevent scaler calls on filter changes.
Oh and I read that it is possible to load other pixel formats with Win2D through some interop layer:
https://github.com/Microsoft/Win2D/issues/198
Someone posted a sample, but it is not complete, he did not update it with his final solution. But then, I am not even sure if we need his complex solution. He was using multiple textures (one for each plane), while we have one texture with a known DXGI_FORMAT (NV12 mostly). Win2D is supposed to be able to work with these formats directly when loaded through iterop. For this to work, we'd have to create our own frames context and set texture binding flags (Win2D requires setting "shader resource" flag for loading textures as image source).
Another way would be creating a BGRA render target texture, render our NV12 texture to it, and then feed that to Win2D. It's probably a bit less effective (plus I don't know exactly how to do this).
Cool. Thanks for taking over. I will try to do some testing in the weekend.
Indeed it is possible to dynamically change the output video resolution. It was a bit more complicated to turn this into a fully working solution, but it seems to run pretty solid now. You can test this with filter "scale" and put resolution as parameter, e.g. "720x576". I also added a bunch of other optimizations, plus drag+drop in sample apps for simple testing.
We could use the same mechanism to dynamically update audio output (in a separate PR). This would allow us to get rid of the resampler for audio filters. And also, we could avoid the complicated workarounds of guessing channels and sample rate of AAC HE (v2) media. Just take what we get from AVCodecContext, and update it once the first real frame comes.
I have been thinking about switching the filter definitions to use a filter string in FFmpeg format (all filters in one string, separated by ';'). It is a well defined string format, supports even komplex graphs, and allows to easily use filter code from the web. Also for testing in the samples, it would be much better. Just type in a new string containing one or more filters and hit "return" to get it applied. There are methods to construct a complete filter graph from one of these strings, so we don't even have to parse it manually.
Indeed it is possible to dynamically change the output video resolution. It was a bit more complicated to turn this into a fully working solution, but it seems to run pretty solid now. You can test this with filter "scale" and put resolution as parameter, e.g. "720x576". I also added a bunch of other optimizations, plus drag+drop in sample apps for simple testing.
Cool. this could open up quite a bit of possibilities.
We could use the same mechanism to dynamically update audio output (in a separate PR). This would allow us to get rid of the resampler for audio filters. And also, we could avoid the complicated workarounds of guessing channels and sample rate of AAC HE (v2) media. Just take what we get from AVCodecContext, and update it once the first real frame comes.
This could come in very handy if we want to dynamically change the audio channel layout at playback time. I planned to implement such a feature for #110 . Most advanced media players do offer such a feature, so it would be cool if we could do it.
I have been thinking about switching the filter definitions to use a filter string in FFmpeg format (all filters in one string, separated by ';'). It is a well defined string format, supports even komplex graphs, and allows to easily use filter code from the web. Also for testing in the samples, it would be much better. Just type in a new string containing one or more filters and hit "return" to get it applied. There are methods to construct a complete filter graph from one of these strings, so we don't even have to parse it manually.
I am fine with this, and it mostly works like this anyway. The reason I separated the filter name from the setup was to avoid the string split operation on our side, which was rather complicated. If we could find a way to init the filter directly from the string without any other operation, that's even better. When I implemented this 2 years ago, I could not find such an API. The examples I found explicitly separated the filter name from its setup.
Damn, I just changed audio code to dynamic format support, only to find out that it does not work?! Whenever I change format, channels, channel layout or sample rate during playback, I get broken audio or no audio at all. Why does this work for video but not for audio? Very disappointing...
So far so good... It's a shame that we can't modify the audio properties. This looks like an oversight from the media team. We should report it somewhere
So it seems to work ok, especially setting the filter as a fully formatted string. I think it should be stable enough to integrate into my app and do some more extensive testing.
I seem to be getting a random MF decode error when applying ffmpeg filters during playback. Has this ever happened to you, @lukasf ?
No I have not noticed this. It happens when applying video filters? Do you know a way to reproduce?
I happens kind of random. Start playing a file without video filters. Then add or remove the video filters until it shows up. I think it is related to the lazy init of filters, but debugging it had been really difficult. I will try to investigate some more in the afternoon.
I tried lots of times with the CS sample, with HW acceleration and without, could not reproduce until now. Are you using the colorchannelmixer filter? Maybe it depends on file format or filter used.
Yes indeed it does seem to be file related. I tried with some other files and it seems to be OK. Audio filters also don't seem to have this problem.
I'd say we should give this some more testing time before we merge.
The frame grabber is broken. Output frames have 0 width and 0 height. This gets passed on by the UncompressedVideoSampleProvider.
I pushed a partial fix for that, I am not sure how to deal with the direct decoding to provided buffer.
This looks fairly stable for my use cases and I think it is ready for the big moment.
Can you take one last look at it, @lukasf ?
Error handling has been improved. I will try to do one more code review, and check on the FrameGrabber thing.
I agree that we should try to get the two open PRs to master branch soon, to test this all together. I think the current feature set will soon be good for a new official NuGet version.
I think I fixed the FrameGrabber. Can you confirm from your side @mcosmin222?
Looks good.
Actually, I think we may need to optimize device change detection after all. It seems to take a whooping 60ms to check a device change, and that is pretty expensive.
I will explore some more into this area.
Oh wow that's massive. Not acceptable at all.
The fastest way might be calling IMFDXGIDeviceManager::TestDevice(handle) to check if the device is still valid. The method is there just for this specific reason. We'd have to keep reference to the DeviceManager and device handle on first SampleRequested.
Seems the worst part is getting the device manager from the MSS. So I cached that and now I will test again.
Meh that method wants a handle. Why can't they just require the device pointer instead?!?!?!?!
Anyway, I did a simple cache of the device manager and it seems to have dropped the processing time to under 1 ms in the sample app.
Strangely, my own app seems to keep reporting around 60 ms in that place, although it no longer has visible stutters as it had before this change.
I will give your method a try too.
Lol I just did almost the exact same changeset. But you won ^^
^^ Almost? Maybe I missed something ^^
hi @lukasf,
Please check the latest changeset.
Thanks
Merged. Thanks for the good work!
Cool. I hope you had a nice Christmas ^^
Yeah it was nice. Hope you had a good time as well.
Still needs some work, particularly tracking down a possible memory leak. And there goes my 2 hours of ffmpeginterop allotted for this week ^^
I also need to think on supporting it from the hw decoder.