OpenVisualCloud / SVT-HEVC

SVT HEVC encoder. Scalable Video Technology (SVT) is a software-based video coding technology that is highly optimized for Intel® Xeon® processors. Using the open source SVT-HEVC encoder, it is possible to spread video encoding processing across multiple Intel® Xeon® processors to achieve a real advantage of processing efficiency.
Other
517 stars 171 forks source link

Encoder random hang #497

Closed tianjunwork closed 4 years ago

tianjunwork commented 4 years ago

./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -intra-period 100 -scd 0 -rc 0 -q 22 -n 1000 -b 0.265 random hang. I wonder if it can be fixed by this PR too.

Thanks for capturing this!

At first, this is another (fixed, not random) hang issue which hasn't been addressed by this PR, or isn't regression caused by this PR.

And here is the hang pattern due to encMode 0 and intra period:

Anyway, it's another hang issue we need to look at.

Originally posted by @Austin-Hu in https://github.com/OpenVisualCloud/SVT-HEVC/pull/475#issuecomment-597735296

Austin-Hu commented 4 years ago

The hang issue can be 100% reproducible with command:

./SvtHevcEncApp -i bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -intra-period 100 -scd 0 -rc 0 -q 22 -n 1000 -b 0.265

with some commits (such as the commit) before v1.3.0, after encoding 296 or 298 frames.

tianjunwork commented 4 years ago

On current master tip(Disable OUT_ALLOC), encoder randomly(1-15th run) hang with only -encMode 0, not other encMode. ./SvtHevcEncApp -i ../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -n 1000 hang at 305th frame at 1-15th run. ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -intra-period 100 -n 1000 hang at 295th frame at 1-8th run. It always hang at 295th frame with -intra-period 100. And I can't reproduce hang with ./SvtHevcEncApp -i bbb_1920x1080_420p.yuv -w 1920 -h 1080 -intra-period 100 -scd 0 -rc 0 -q 22 -n 1000 -b 0.265.

tianjunwork commented 4 years ago

Did regression test on v1.3 with ./SvtHevcEncApp -i ../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -n 1000. It is easier to reproduce on v1.3, nearly 100%. Tested with 4k, it didn't hang.

tianjunwork commented 4 years ago

Equivalent ffmpeg command also hang: ./ffmpeg -stream_loop -1 -video_size 1920x1080 -r 60 -i ../../yuv/bbb_1920x1080_420p.yuv -c:v libsvt_hevc -preset 0 -y 0.mp4

Austin-Hu commented 4 years ago

Root cause of this issue: The single thread SvtHevcEncApp (and other single thread encoding frameworks such as ffmpeg) calls EbH265EncSendPicture, EbH265GetPacket and EbH265ReleaseOutBuffer sequentially. But EbH265EncSendPicture and EbH265GetPacket are asynchronous, so that application would get the 1st available encoded packet after sending several frames. When EbH265EncSendPicture hangs due to exhausted Input FIFO, application couldn't release Output FIFO in time and the pending frames in the encoding pipeline are also blocked in the single thread InitialRateControl kernel where Output FIFO objects are dequeued. So the application and encoder are locked each other.

Root cause tracing details:

  1. Application repeat sends frames (until hangs at the 519th frame);
  2. Because it takes random time to complete encoding a frame, the application would randomly get the encoded packets. When encoding hangs, application would get totally 295 (or 296, 297 or 298) frames, and release Output objects with corresponding times;
  3. During encoding, the stages (through EncDec -> Packetization -> RaceControl (with releasing Input objects)) complete for 405 times. Right now application is sending the 519th frame (519 - 405 = 114). But the Input FIFO just has 113 objects, so that application hangs at EbH265EncSendPicture, cand can't go ahead to call following EbH265ReleaseOutBuffer to release Output object;
  4. But currently all the (totally 113) Output FIFO objects have been exhausted. At the moment the single thread InitialRateControl kernel gets POC 412, it will handle POC 395~412 (in group of LookAheadDistance = 17 as sliding window). But when it handles POC 408 (or 409, 410 or 411), where 408 + 1 - 295 = 114, it can't dequeue any object from the Output FIFO, so that the InitialRateControl kernel hangs;

(BTW, moving the procedure of dequeuing Output FIFO objects from InitialRateControl to Packetization couldn't help resolve this issue. It also hangs when EncDec feeds back POC 262 to PictureManger, but the single thread PictureManger has already hung at dequeuing the (totally 18) Child PCS FIFO objects, and couldn't handle the posted POC 262 from EncDec.)

Options to address this issue:

  1. Decouple EbH265EncSendPicture and EbH265GetPacket calling into different threads in encoding applications (and frameworks), to work around this issue, like what PR #507 does;
  2. Like what PR #508 does, calculate the correct number of each FIFO objects, passing through some kernels or the whole encoding pipeline, depending on the specific encoding FPS and time consumed (during runtime) in each encoding stage. So each FIFO have enough objects to be used, at any time and anywhere.
  3. Increase the number of some FIFO objects (such as Input, Output, etc.) with some tune effort.
tianjunwork commented 4 years ago

Fixed by #523