NVIDIA / VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
Apache License 2.0
1.32k stars 233 forks source link

VPF can't seem to handle SVC for VP9 #538

Closed gustavo-lighttwist closed 1 year ago

gustavo-lighttwist commented 1 year ago

Describe the bug VPF doesn't seem to support VP9 streams that use SVC (L2T3_KEY, L3T3_KEY, etc).

To Reproduce Steps to reproduce the behavior:

Expected behavior Frame is decoded properly and covers the entire surface, without being cut or having black borders. Sometimes the decoder doesn't seem to output frames at all.

Screenshots Example of a failed decode - decoded frame seems to be from one layer but doesn't match the surface dimensions.

failed_decode

Additional information I'd be happy to provide an example VP9 stream which isn't decoded properly.

RomanArzumanyan commented 1 year ago

Hi @gustavo-lighttwist

To my best knowledge, there's no HW acceleration for VP9 SVC. However, SW PyFFmpegDecoder fallback decoder is available.

gustavo-lighttwist commented 1 year ago

Well I could not find much information about whether NVDEC supports SVC or not (which kinda implies it doesn't), but I'll keep investigating if the layers can be pre-processed before being passed to the decoder (as a non-SVC VP9 stream).

I think VPF should support at least the SVC streams configured to use only 1 spatial layer (L1T1, L1T2, L1T3), where the only layer is the base layer which is a valid VP9 stream, which AFAIU means reading some metadata in the packet to know which resolution to output.

An alternative is to expose an interface to be able to reconfigure a decoder instance (cuvidReconfigureDecoder?) so a developer can parse the packets and change the resolution of a decoder without needing to reinitialize a new decoder (and lose the decoding context).

RomanArzumanyan commented 1 year ago

I think VPF should support at least the SVC streams configured to use only 1 spatial layer

How do you think it's possible to implement this feature?

developer can parse the packets and change the resolution of a decoder without needing to reinitialize a new decoder Decoder is reinitialized automatically under the hood of Video Codec SDK once upon elementary bitstream parser (which is built in Video Codec SDK) comes across specific non-VCL NALu (e. g. SPS / PPS in case of H.264 / H.265).

BTW you can decode compressed frames one by one using DecodeSurfaceFromPacket method of PyNvDecoder class. So any tool which outputs elementary bitstream NALu is fine, be it FFMpeg or any other demuxer of your choice.

gustavo-lighttwist commented 1 year ago

How do you think it's possible to implement this feature?

AFAIU the base layer can be decoded as a non-SVC VP9 stream. If there is only 1 spatial layer, then this layer is the base layer and can be handled as a non-SVC VP9 stream. The only issue now is that VPF is outputting the wrong resolution as it doesn't seem to be handling the headers indicating the layer resolution properly. TLDR: decoding already works, it's just not getting the correct resolution from the packets headers.

Decoder is reinitialized automatically under the hood

I've hit issues with this (streams that get resized mid-stream causing PyNvDecoder to crash/fail or output the wrong resolution), but I'll open a separate issue for that. It would be great if the developer could resize the decoder directly, without relying on VPF to be parsing/handling every possible configuration correctly.

you can decode compressed frames one by one using DecodeSurfaceFromPacket

Yes, that's what I've been using, but I've had many issues with streams that change resolution mid-stream, but I'll follow up with that on a separate issue.

RomanArzumanyan commented 1 year ago

@gustavo-lighttwist

It would be great if the developer could resize the decoder directly, without relying on VPF to be parsing/handling every possible configuration correctly.

On the contrary, it will be too much of a code support burden to carry across multiple HW generations, codecs, OSes, HW platforms etc. I'd rather rely on unified decoder behavior provided by VC SDK since it's developed and supported by VC SDK team who take care of all the circumstances I've listed above.

I've hit issues with this (streams that get resized mid-stream causing PyNvDecoder to crash/fail or output the wrong resolution), but I'll open a separate issue for that.

Please go ahead, bug shall be fixed. There's a unit test for that https://github.com/NVIDIA/VideoProcessingFramework/blob/82b51e7c29cb1c8259721170f39e95f3e95b4ad4/tests/test_PyNvDecoder.py#L289, you can just modify it to fit your input video.

But please make sure that input is supported according to Nvdec support matrix.

By design VPF supports HW acceleration for everything that VC SDK does, and for the rest there's a SW fallback in form of PyFFmpegDecoder.

gustavo-lighttwist commented 1 year ago

On the contrary, it will be too much of a code support burden to carry across multiple HW generations, codecs, OSes, HW platforms etc. I'd rather rely on unified decoder behavior provided by VC SDK since it's developed and supported by VC SDK team who take care of all the circumstances I've listed above.

I agree with that. That's why I opened this issue so we could have VPF handling the headers and outputting the decoded frames at the correct resolution.

The idea of exposing an interface to let the developer reconfigure the decoder is an alternative in case there is no interest in adding partial SVC support to VPF.

gustavo-lighttwist commented 1 year ago

After more research, I think that the SVC extension is handled at a different layer, so VPF should not handle SVC in any way. The only issue seems to be handling the resolution flags from the VP9 bitstream properly, but I'll follow up on that on a separate issue with a concrete example.

Thank you for your time and sorry about the noise.