NVIDIA / VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
Apache License 2.0
1.3k stars 231 forks source link

H265 decoding: corrupted frames after CUDA_ERROR_INVALID_VALUE. #125

Closed aviyaChe closed 3 years ago

aviyaChe commented 3 years ago

I run VPF on some test videos, every now and than the decoded frame seems to be corrupted - visible encoding artifacts and frames go back in time for ~1-2 sec.

in addition, at the same time an error appear in stderr:

CUDA error: CUDA_ERROR_INVALID_VALUE
invalid argument

h/e, no exception is raised from vpf at any point.

python code I run -

    nvDec = nvc.PyNvDecoder(encFilePath, gpuID)

    to_rgb = nvc.PySurfaceConverter(int(nvDec.Width()), int(nvDec.Height()), nvc.PixelFormat.NV12, nvc.PixelFormat.BGR, gpuID)
    nvDwn = nvc.PySurfaceDownloader(int(nvDec.Width()), int(nvDec.Height()), to_rgb.Format(), gpuID)

    while True:
        try:
            rawSurface = nvDec.DecodeSingleSurface()
            if (rawSurface.Empty()):
                print('No more video frames')
                break

            # convert to rgb
            cvtSurface = to_rgb.Execute(rawSurface)
            if (cvtSurface.Empty()):
                print('Failed to do color conversion')
                break

            bgr = np.ndarray(shape=(cvtSurface.HostSize()), dtype=np.uint8)
            success = nvDwn.DownloadSingleSurface(cvtSurface, bgr)

            if not success:
                print('Failed to download surface')
                break

            bgr = bgr.reshape(nvDec.Height(), nvDec.Width(), 3)
            bgr = cv2.resize(bgr, dsize=None, fx=0.5, fy=0.5)
            cv2.imshow("", bgr)
            cv2.waitKey(1)
            time.sleep(0.1)
        except nvc.HwResetException:
            print('Continue after HW decoder was reset')
            continue

ffprobe result: Duration: 00:49:59.99, start: 0.000000, bitrate: 6490 kb/s Stream #0:0(und): Video: hevc (Main) (hev1 / 0x31766568), yuvj420p(pc, bt709), 3840x2160 [SAR 1:1 DAR 16:9], 6489 kb/s, 12 fps, 12 tbr, 1200k tbn, 12 tbc (default) Metadata: handler_name : VideoHandler [STREAM] index=0 codec_name=hevc codec_long_name=H.265 / HEVC (High Efficiency Video Coding) profile=Main codec_type=video codec_time_base=1/12 codec_tag_string=hev1 codec_tag=0x31766568 width=3840 height=2160 coded_width=3840 coded_height=2160 has_b_frames=0 sample_aspect_ratio=1:1 display_aspect_ratio=16:9 pix_fmt=yuvj420p level=150 color_range=pc color_space=bt709 color_transfer=bt709 color_primaries=bt709 chroma_location=unspecified field_order=unknown timecode=N/A refs=1 id=N/A r_frame_rate=12/1 avg_frame_rate=1000000/83333 time_base=1/1200000 start_pts=0 start_time=0.000000 duration_ts=3599985600 duration=2999.988000 bit_rate=6489193 max_bit_rate=N/A bits_per_raw_sample=N/A nb_frames=36000 nb_read_frames=N/A nb_read_packets=N/A DISPOSITION:default=1 DISPOSITION:dub=0 DISPOSITION:original=0 DISPOSITION:comment=0 DISPOSITION:lyrics=0 DISPOSITION:karaoke=0 DISPOSITION:forced=0 DISPOSITION:hearing_impaired=0 DISPOSITION:visual_impaired=0 DISPOSITION:clean_effects=0 DISPOSITION:attached_pic=0 DISPOSITION:timed_thumbnails=0 TAG:language=und TAG:handler_name=VideoHandler [/STREAM]

rarzumanyan commented 3 years ago

Hi @aviyaChe

Can you reproduce the issue with local video file? If so, please share the file, I'll take a closer look at it.

aviyaChe commented 3 years ago

Thanks @rarzumanyan for the fast respond. yes, the problem appear each time a local video is played. h/e I prefer not posting it in public, I sent you the file to the email address attached to your profile.

rarzumanyan commented 3 years ago

Hi @aviyaChe

According to FFmpeg, your video file has YUVJ420P pixel format:

   AV_PIX_FMT_YUVJ420P,  ///< planar YUV 4:2:0, 12bpp, full scale (JPEG), deprecated in favor of AV_PIX_FMT_YUV420P and setting color_range

Which should be 12 bit per pixel. However, Sequence Parameter Set NAL unit has chroma_format_idc = 1 which means usual 8 bit per pixel YUV420P format. VPF uses FFmpeg for demuxing so it takes YUVJ420P format for granted. Mismatch between FFmpeg and VPF formats happens here: https://github.com/NVIDIA/VideoProcessingFramework/blob/b64e09db34149697a86c4aefe693e88b262aa22b/PyNvCodec/TC/src/Tasks.cpp#L622-L629

That causes decoder to fail during initialization. I'm not sure this can be fixed within VPF, it looks like FFmpeg bug. It determines pixel format which doesn't correspond to Sequence Parameter Set NAL unit.

Quick and dirty fix to this problem would be to modify mentioned piece of code as follows:

 switch (pImpl->demuxer.GetPixelFormat()) { 
 case AV_PIX_FMT_YUVJ420P:
 case AV_PIX_FMT_YUV420P: 
 case AV_PIX_FMT_NV12: 
   params.videoContext.format = NV12; 
   break; 
 case AV_PIX_FMT_YUV444P: 
   params.videoContext.format = YUV444; 
   break;

But I wouldn't push that to master.

aviyaChe commented 3 years ago

Hi @rarzumanyan thanks for digging in.

The error I described above was not about initialization, initialization went well and I was able to see the video but sometime there was corrupted frames - that code was compiled when master HEAD was on commit da57ab21 (16.09.2020).

After I took the latest master and updated the file VideoProcessingFramework/PyNvCodec/TC/src/Tasks.cpp as you recommended I get an error in initialization:

RuntimeError: Error initializing filter_units=pass_types=39-40 bitstream filter: Bitstream filter not found

ffplay plays the video well (with no H/W acceleration) so I'm not sure this is FFmpeg bug.

UPDATE: in order to bypass the sei problem i convert filter_units=pass_types=39-40 into an empty string so filter will be empty. The decoder initialization is done and not throwing errors but it seems like the frames are still corrupted.

rarzumanyan commented 3 years ago

Hi @aviyaChe

initialization went well and I was able to see the video but sometime there was corrupted frames

Ok, let me update to latest ffmpeg and check once again. On my machine decoder fails to initialize due to formats mismatch.

rarzumanyan commented 3 years ago

@aviyaChe

RuntimeError: Error initializing filter_units=pass_types=39-40 bitstream filter: Bitstream filter not found

Your FFmpeg build lacks filter_units bitstream filter which extracts SEI messages. Since this is optional (many people don't need to extract SEI at all) and there were complains about that in past, I've made the lazy filter init. It's now only loaded if user actually passes the numpy array for SEI message in decode function. I've pushed the commit to master branch, please check it out. As I find something useful about other problem (corrupt video frames) I'll update you.

rarzumanyan commented 3 years ago

@aviyaChe

I've modified my local VPF sources to ignore YUVJ420P format thing and I was able to reproduce the bug. Couple times decoder fails to decode picture and it prints message to stderr:

C:\GitHub\VideoProcessingFramework\install\bin>python SampleDecode.py 0 defunc.mp4 defunc.nv12
This sample decodes input video to raw NV12 file on given GPU.
Usage: SampleDecode.py $gpu_id $input_file $output_file.
Decoding on GPU 0
C:\GitHub\VideoProcessingFramework\PyNvCodec\TC\src\NvDecoder.cpp:450
CUDA error: CUDA_ERROR_INVALID_VALUE
invalid argument
C:\GitHub\VideoProcessingFramework\PyNvCodec\TC\src\NvDecoder.cpp:450
CUDA error: CUDA_ERROR_INVALID_VALUE
invalid argument
C:\GitHub\VideoProcessingFramework\PyNvCodec\TC\src\NvDecoder.cpp:450
CUDA error: CUDA_ERROR_INVALID_VALUE
invalid argument
...

To find the culprit, I've also decoded same video with FFmpeg. It has 2 NV-accelerated decoding paths supported. First uses stand-alone cuvid decoder which facilitates built-in Video Codec SDK Annex.B elementary stream parser (same as VPF). It also has issues with this video at exactly same places:

c:\Install\ffmpeg-master-tot\bin>ffmpeg.exe -c:v hevc_cuvid -i c:\GitHub\VideoProcessingFramework\install\bin\defunc.mp4 -c:v rawvideo -pix_fmt nv12 c:\GitHub\VideoProcessingFramework\install\bin\out_ffmpeg.yuv
ffmpeg version N-99556-g5ae3da0209 Copyright (c) 2000-2020 the FFmpeg developers
  built with Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
  configuration: --prefix=/c/git/ffmpeg/build_x64_release_shared --toolchain=msvc --arch=x86_64 --disable-static --disable-programs --disable-stripping --disable-network --disable-doc --enable-x86asm --enable-shared --enable-ffprobe --enable-ffmpeg --enable-nonfree --enable-nvenc --enable-nvdec --enable-cuda --enable-cuda-nvcc --enable-libnpp --extra-cflags='/arch:AVX2'
...
[hevc_cuvid @ 000001E7DBBE3540] ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc_cuvid @ 000001E7DBBE3540] cuvid decode callback error
Error while decoding stream #0:0: Generic error in an external library
[hevc_cuvid @ 000001E7DBBE3540] ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc_cuvid @ 000001E7DBBE3540] cuvid decode callback error
Error while decoding stream #0:0: Generic error in an external library
[hevc_cuvid @ 000001E7DBBE3540] ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams) failed -> CUDA_ERROR_INVALID_VALUE: invalid argument
[hevc_cuvid @ 000001E7DBBE3540] cuvid decode callback error
Error while decoding stream #0:0: Generic error in an external library
frame=  845 fps= 15 q=-0.0 Lsize=10266750kB time=00:01:10.41 bitrate=1194393.6kbits/s dup=7 drop=0 speed=1.24x
video:10266750kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%

Another NV-accelerated path uses FFmpeg built-in parser instead of one which is shipped with Video Codec SDK. It can decode whole video without any issues:

c:\Install\ffmpeg-master-tot\bin>ffmpeg.exe -hwaccel cuda -i c:\GitHub\VideoProcessingFramework\install\bin\defunc.mp4 -c:v rawvideo -pix_fmt nv12 c:\GitHub\VideoProcessingFramework\install\bin\out_ffmpeg_hwaccel_cuda.yuv
...
frame=  841 fps=4.1 q=-0.0 Lsize=10218150kB time=00:01:10.08 bitrate=1194393.6kbits/s speed=0.34x
video:10218150kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%

So I assume this to be Video Codec SDK built-in parser bug. If you don't mind I'll use the video you've sent me to submit a bug in our in-house bug tracker and inform VC SDK people about that. That will take some time to investigate so meanwhile I'm afraid I can't provide you with any workaround. On my machine VPF decodes all the video frames and couple of them are "going back in time" when issue happens but rest ~98% of frames are fine.

aviyaChe commented 3 years ago

Hi @rarzumanyan, thanks again for your quick respond. VPF really helped me to boost performance on my server and it will be a shame not using it. so I'm going to ask some additional questions, hope its ok :)

Another NV-accelerated path uses FFmpeg built-in parser instead of one which is shipped with Video Codec SDK

What do you think will it take to make VFP working with the standard decoder? Is it something I can do straight forward inside VPF? and if so, do you think the runtime performance be the same?

To find the culprit, I've also decoded same video with FFmpeg. It has 2 NV-accelerated decoding paths supported

I was always wondering what is the different between the two ffmpeg options... is the second option also runs on Nvdec device or as the name suggest it runs on cuda cores?

I'm afraid I can't provide you with any workaround

maybe you can - usually we read IP cam stream via RTSP and direct it straight to VPF. h/e I can change some encoding settings for instance using h264 instead of h265 but also less dramatic settings like variable/constant bit rate, variable/constant I frame interval and so. do you think I can avoid the bug by changing camera settings ?

If you don't mind I'll use the video you've sent me to submit a bug in our in-house bug tracker and inform VC SDK people about that.

sure, but please do not publish it outside the organization.

rarzumanyan commented 3 years ago

@aviyaChe

What do you think will it take to make VFP working with the standard decoder?

It's not the decoder problem, but parser problem.

Parser is a SW component which parses video stream, extract parameters such as width, height, etc. It also extracts encoded video frames and sends them to HW decoder. VPF uses built-in parser which is part of GPU driver.

Unlike that, ffmpeg -hwaccel cuda uses different parser implemented in FFmpeg and not in driver. It processes video stream differently but also extracts encoded video frames and sends them to Nvdec (HW decoder).

Long story short, SW parser is kind of a front-end and Nvdec is a HW back-end. Every front-end (parser) compatible with Video Codec SDK API can use back-end (Nvdec).

It turns that FFmpeg parser doesn't have the bug which built-in parser has. This is rare situation as built-in parser is thoroughly tested so I'm surprised and I was thinking that problem is VPF-specific. If you can change codec to H.264 and that's appropriate for you project - please go ahead and do so. Also encoding settings change may be helpful as well.

Beside that it would definitely help if you send me a letter with couple words about your company so that I'll submit bug registered by customer and not just me in my hobby GitHub project.

aviyaChe commented 3 years ago

@rarzumanyan, thanks for the explanations. and sorry for messing up with terminology. Maybe I wasn't clear enough - I was asking since I'm thinking to implement it myself - add an option in VPF to use standard FFMpeg parser.
I will be happy to get your opinion and if you think this is a good idea or what obstacles I may come across. Basically FFmpeg code should be a good reference.

Beside that it would definitely help if you send me a letter with couple words about your company so that I'll submit bug registered by customer and not just me in my hobby GitHub project.

Yes, I send you an email. thanks!

rarzumanyan commented 3 years ago

Hi @aviyaChe

I will be happy to get your opinion and if you think this is a good idea or what obstacles I may come across.

It's fairly complicated, and complexity isn't on SW engineering side but rather on legal / org. side.

Annex.B parser is essential part of HEVC decoder within FFMpeg and to my best knowledge there's no clean API for Annex.B parsing (parser is embedded into decoder's codebase as bunch of functions and such).

Refactoring FFmpeg source code IMO is not an option due to licensing policies. To keep it under LGPL all the changes must be published so this is a huge effort to extract parser outside codec, test patches, brush them up and pursue community that these changes are useful.

P. S. RTSP video issues are ~90% of overall problems VPF users are facing. It's often difficult to judge if the incoming bitstream is broken due to transmission issues, or Nvdec fails to decode so I'm working in direction of decoupling the networking & demuxing (extracting video from containers) from actual HW decoding. There's a project called PyAV which provides Python bindings to FFmpeg libraries so ideally I'd like to do everything which happens before the Annex.B packet is kicked off to Nvdec with PyAV and everything which happens later (decoding, processing, ML and such) with VPF and PyTorch.

If you're willing to spend couple of your cycles on VPF development, I'd be happy to provide you with info you need for that although my advice would be to stay away from FFmpeg guts and to take a look at PyAV project.

aviyaChe commented 3 years ago

Hi @rarzumanyan , thanks for that very important notes. currently we found some walk around but we will be happy to get the fix in the future.

yes, I also think it will be a good idea to separate networking-parsing from GPU decoding. for now I'm taking your advice and keep with this walk around.

thanks for helping and being engaged with this issue.

Aviya

rarzumanyan commented 3 years ago

Hi @aviyaChe

Our driver team has root-caused the issue and now is working on fix. However, I honestly don't know driver version the fix will be included in.

aviyaChe commented 3 years ago

@rarzumanyan seems to work well on Driver Version: 465.19.01. can close this issue.