Gui-Yom / turbo-metrics

Hardware acceleration for your daily video tasks
GNU Affero General Public License v3.0
8 stars 2 forks source link

Fallback for undecodable frames #5

Open 0xb01u opened 1 day ago

0xb01u commented 1 day ago

I've been testing turbo-metrics for the past few days and I noticed an error when using certain specific files.

The error log is the following:

Using device NVIDIA GeForce RTX 4060 with CUDA version 12060
[crates/turbo-metrics/src/lib.rs:254:14] demuxer_dis.clock_rate() = 1000000
Reference: H.264/AVC   , 1920x1080, CP: BT709, MC: BT709, TC: BT709, CR: Limited
Distorted: H.264/AVC     , 1920x1080, CP: BT709, MC: BT709, TC: BT709, CR: Limited
Initializing SSIMULACRA2
Initialized, now processing ...
thread 'main' panicked at crates/cudarse/cudarse-video/src/dec_simple.rs:177:17:
decode 2 : still in decode queue
stack backtrace:
   0: rust_begin_unwind
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/panicking.rs:72:14
   2: cudarse_video::dec::decode_picture
             at /home/user/turbo-metrics/crates/cudarse/cudarse-video/src/dec_simple.rs:177:17
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <turbo_metrics::input_video::MkvDemuxer as turbo_metrics::input_video::Demuxer>::demux
             at /home/user/turbo-metrics/crates/cudarse/cudarse-video/src/dec.rs:160:18
   8: turbo_metrics::process_video_pair
             at /home/user/turbo-metrics/crates/turbo-metrics/src/input_video.rs:134:9
   9: turbo_metrics::main
             at /home/user/turbo-metrics/crates/turbo-metrics-cli/src/main.rs:160:38

This happens consistently for those specific video frames. For each one of them, the reported frame "still in decode queue" is always the same (2nd, 3rd, 4th, 5th frame, depending on the video) on every execution.

Although certain software (e.g. mpv) seem to play correctly those videos using hardware acceleration, I have noticed that ffplay with hardware acceleration enabled keeps stuttering on those specific videos. (Repeating a previous frame sporadically instead of decoding the correct one.) So this seems to be an issue with nvdec and specific videos. What do those videos have to cause the issues, and how those mpv fix it? I do not know.

I have tested the files using 2 different GPUs, both with CUDA 12.6, and it happens consistently.

Since this seems to be an issue with the specific video files and nvdec, I do not know if there is something to fix in turbo-metrics. However, maybe a fallback could be implemented, to skip the specific failing frames, or to use CPU decoding, instead of panicking?

Gui-Yom commented 1 day ago

A CPU decode option is indeed planned (things live AV1 hardware decode require very recent hardware, rtx 3000+). After seeing a problem like this, I'll start working on it asap.

The problem seems to come from the scheduling of frames to be decoded (a frame was scheduled to be decoded in an occupied dpb slot). This scheduling is managed by the Cuvid bitstream parser as I don't parse it myself currently. Maybe I don't allocate a big enough dpb ? (I allocate a bit more than necessary). Maybe there is a bug in Cuvid (less probable) or in the way I handle things (more probable) ? Heck, maybe there is a problem with the bitstream and mpv is able to fix it during playback.

Anyway, to debug this further it would be nice if you could send me bits of video that have this problem (my email is in this repo, or through discord). If possible, cut a only a small part (without reencoding). I can understand if you don't want to share the video though.

I don't know about your mpv configuration so it might not even use nvdec to decode. About ffmpeg, I think there are 2 decoders using nvdec, one is based on the cuvid parser like I do (h264_cuvid iirc) and another is parsing the bitstream and filling the necessary structures itself and is making calls to nvdec directly (h264_nvdec iirc). That last one might not suffer from the same problems.

Gui-Yom commented 11 hours ago

The described bug is fixed by https://github.com/Gui-Yom/turbo-metrics/commit/b87f8cd5e9c70b18c7d025ece077b7948e97f0d4.

For reference, now I do not feed many nalus at once to cuvid because that was cuasing it to possibly fire many decode callbacks at once, exhausting the dpb.

I'll keep this issue open so I can remember to add a libavcodec fallback.