Hardware decoding fails when switching video output devices

brabebhin commented 3 years ago

Hello @lukasf,

I recently encountered "a big one". My laptop has both a dedicated and an integrated GPU (like most mid to high end laptops). It also has a HDMI port, which is linked directly to the dedicated GPU. Like most laptops, the display is linked to the integrated GPU.

I connected the HDMI port to an external display, and I noticed playback doesn't work correctly when moving the sample apps between the displays which are connected to 2 different GPUs. Playback works correctly when the sample app goes back to its original display, and a seek operation is performed.

To make things even worse, it matters which option for GPU is chosen for in Windows Display settings. If it is "let windows decide", it is not guaranteed this will work properly. If the option of GPU matches that of the display, it will work on the respective display.

TL;DR

I think we need to handle the case in which the application starts playback on one display connected to one GPU, then it is moved to another GPU. This might be more common than originally anticipated.

This seems to be labeled as "device lost" scenario

https://docs.microsoft.com/en-us/windows/uwp/gaming/handling-device-lost-scenarios

brabebhin commented 3 years ago

Interestingly enough, there is nothing in our code that shows any sorts of errors. Querying ID3D11Device::GetDeviceRemovedReason returns S_OK, which essentially can mean anything, from all is fine to everything went to hell but recovered somehow. It might be a limitation of winRT or ffmpeg or both.

lukasf commented 3 years ago

Maybe we need to get the device from our MSS on each SampleRequested, and check if it is still the same device we used for decoding (each device has some UID afaik). If it is a different device, we could try to copy the frame from decoder GPU to new target GPU. Or we copy to CPU memory, like we do with effects. At least, I'd assume that actual decoding will still work, even if the app moved to a different screen. That would explain why there is no error in our code. I hope that it is only the rendering that fails, because we try to display a texture that was created on a different GPU.

I currently don't have the hardware to test this scenario.

brabebhin commented 3 years ago

Yes it looks like just rendering fails. I will give your idea a try in the coming days.

brabebhin commented 3 years ago

You are 110% right. When moving to different screen, the device ID is different between the one initiated by the Starting event and the one from sample requested event!

Now I need to properly handle this scenario.

brabebhin commented 3 years ago

It seems copying from GPU to other GPU is not that straightforward. Probably will need to default to CPU copy like we do for effects.

brabebhin commented 3 years ago

I have pushed the code to the ffmpeg video filters branch, since it had quite a lot of code that I needed to make this work. However, the performance is still pretty low on HEVC 4K files. But I just don't see any other way to do this, since copying from GPU to GPU simply doesn't work (and in my particular case it would involve copying from System memory (iGPU) to System memory (CPU) to nVidia memory (dGPU) anyway.

I wonder if we could recreate the decoder pipeline gracefully instead, when we detect the change in GPU devices.

lukasf commented 3 years ago

Can you test how other players behave in this scenario (when HW acceleration is used)? Will they continue playback smoothly or will there be some gap/break or other irregular behavior?

Theoretically, we could re-initialize the pipeline. It's not even difficult I think. But the problem is that decoding will only really work starting with the next key frame. So it could take some time until clean frames are produced (not sure if we get black frames, garbage frames, or no frames at all until next key frame). We could also delay re-initialization until the next keyframe packet comes (and maybe use CPU copy until that point). However, it could be difficult to find the right place to re-initialize with the current internal structure. Last option would be performing a backward seek to the file position of the last key frame. Maybe this is the best option. It would get us uninterrupted video, but there would be a slight delay, because MF would decode and skip frames from the previous keyframe position to actual playback position.

lukasf commented 3 years ago

This is just a quick hack at trying to re-init hw acceleration. Maybe you can give it a try?

Put this in OnSampleRequested:

        // check if device changed!!
        if (hasNewDevice) 
        {
            av_buffer_unref(&avHardwareContext);

            SAFE_RELEASE(device);
            SAFE_RELEASE(deviceContext);

            avHardwareContext = av_hwdevice_ctx_alloc(AVHWDeviceType::AV_HWDEVICE_TYPE_D3D11VA);
            HRESULT hr = D3D11VideoSampleProvider::InitializeHardwareDeviceContext(sender, avHardwareContext, &device, &deviceContext);

            if (SUCCEEDED(hr))
            {
                // assign device and context
                for each (auto stream in videoStreams)
                {
                    if (stream->m_pAvCodecCtx->hw_device_ctx)
                    {
                        // set device pointers to stream
                        stream->SetHardwareDevice(device, deviceContext, avHardwareContext);

                        // must flush streams to get clean output
                        stream->Flush();
                    }
                }
            }
        }

Change SetHardwareDevice method:

void FFmpegInterop::MediaSampleProvider::SetHardwareDevice(ID3D11Device* device, ID3D11DeviceContext* context, AVBufferRef* avHardwareContext)
{
    av_buffer_unref(&m_pAvCodecCtx->hw_device_ctx);
    av_buffer_unref(&m_pAvCodecCtx->hw_frames_ctx);

    SAFE_RELEASE(this->device);
    SAFE_RELEASE(this->deviceContext);

    device->AddRef();
    context->AddRef();
    this->device = device;
    this->deviceContext = context;
    m_pAvCodecCtx->hw_device_ctx = av_buffer_ref(avHardwareContext);
}

brabebhin commented 3 years ago

Unfortunately that did not work.

deviceContext->CopySubresourceRegion(renderTexture, 0, 0, 0, 0, decodedTexture, (UINT)(unsigned long long)avFrame->data[1], NULL);

Exception thrown at 0x00007FFDAA4789C0 (nvwgf2umx.dll) in MediaPlayerCPP.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

I checked the (ffmpeg) decodedTexture and the (ours) rendertexture, they have different device pointers, ffmpeg texture keeping its initial one pointer, thus copying is not possible.

brabebhin commented 3 years ago

Other players

movies+ TV has a noticeable freeze when changing GPUs

MPC-BE seems to be pretty smooth. VLC (windows store) is pretty smooth.

However, the smoothness may be covered behind the fact that the window is also resizing at that point, since it goes from the 17 inch FHD laptop screen to the 4k, significantly larger TV, so it may be that I do not notice it.

brabebhin commented 3 years ago

Could we not reinitiate the pipeline and seek to the previous key frame instead? I think we are allowed a bit of jagged moments, people don't just switch GPUs all the time.

The worse part is that for my own app, with the same windows settings, it always picks the intel GPU, unless i explicitly select the nvidia GPU. this doesn't happen with the sample app for some reason. Sample app always gets the correct GPU based on initial display.

lukasf commented 3 years ago

I think I have a working solution, please check latest changeset. I do a complete re-initialization of the AVCodecContext, then seek to last position and drop samples until we are really there. Seems to work fine with only minimal playback interruption. I tested it locally by fake detecting a device change every 100 frames.

Code is not cleaned up, error handling needs to be improved. I think a faster way to detect device change is to keep reference of device manager and device handle, then call IMFDXGIDeviceManager::TestDevice(handle) in SampleRequested. It should return "new device" when there was a change. Also, we could store the binary position of the last video key frame packet. This would allow faster seek operation.

brabebhin commented 3 years ago

Awesome, it works!

I've done some changes to how we detect the device change: instead of using the cached ID from the OnStarting, we now use the device pointer from the media sample provider, as before it was detecting change on every sample.

I now also reset the texture pool when device changing, so it gets to recreate the textures on the new device.

I think we can directly compare the device pointers in order to figure out the device change. The "device manager" seems to always return different pointers for different devices, and identical pointers for identical devices.

It also works fine with my own app.

brabebhin commented 3 years ago

I've tried to implement binary seek to the first packet of a key frame, and it sometimes seems to fail, GetNextSample sending in a null sample after switching devices. I pushed the changes, you can take a look and see where I did wrong :(

lukasf commented 3 years ago

I read about binary seek in FFmpeg, and the bottom line is that you just shouldn't use it. Some formats don't support it at all, others support it but might have broken timestamps afterwards. It's no use. So let's stick with normal seek.

About the issue you have with your app selecting the wrong GPU, just guessing here: Could this be a timing issue? Like, if you create a new MediaPlayer instance with AutoPlay, then assign our MSS and then put it on the UI, the decoding might start while the MediaPlayer does not yet know where it will be shown. If that is the case, you might have to put the MediaPlayer on the UI first, before calling Play. Or set AutoPlay on the MediaPlayerElement instead of MediaPlayer.

brabebhin commented 3 years ago

Could be. Or it could be just another strange thing that media playback list does. But at this point it doesn't matter since the situation is now handled. Either way, i want to focus on getting the branch merged. I will do drive test runs and see if at have any leaks when switching devices.

I will also need your help on the frame grabber issue.

ffmpeginteropx / FFmpegInteropX

Hardware decoding fails when switching video output devices #225