Closed Austin-Hu closed 4 years ago
Great to see the comment in there..and good job finding this fix.
This comment pertains to a lot of the codebase, but it would be great to simplify some of the pointer dereferencing.. e.g. pictureControlSetPtr->ParentPcsPtr->refPaPcsArray[REF_LIST_0] is used in numerous places - might be simpler to read if a local variable was set and then used (e.g.) refList0 = pictureControlSetPtr->ParentPcsPtr->refPaPcsArray[REF_LIST_0]; ...
Thanks, @intelmark . I just followed the style in the kernel implementations, otherwise there'd be too many variables for such kind of dereferencing...
Completed several testing with the PR:
---------------------------------------------------------
Test Begin...
Total Number of Tests: 33750
Total Passed: 33689
Percentage Passed: 99.8192592593%
Time Elapsed: 3 day(s), 18 hour(s), 47 minute(s), 23 second(s)
---------------------------------------------------------
Full CI Failed Cases:
---------------------------------------------------------
Test Begin...
Total Number of Tests: 2433
Total Passed: 2431
Percentage Passed: 99.9177969585%
Time Elapsed: 13 hour(s), 3 minute(s), 33 second(s)
---------------------------------------------------------
Fast CI (with "-lp 1" option) Failed Cases:
(Executed its testing in 2 machines with ~6 days)
Fast CI (with "-lp 1 -n 1000" options) Failed Cases, which also fail without the PR applied:
Input | Command Line | Performance Argument | Without PR #475 | With PR #475 (at commit) |
---|---|---|---|---|
8K P420 8-bit | ./SvtHevcEncApp -i 8k2D_musician_7680x3840_8bit_30Hz_P420.yuv -w 7680 -h 3840 -b 0.265 -n 10000 | Average Speed (FPS) | 38.96 | 38.91 |
Total Encoding Time (ms) | 256673 | 256980 | ||
Total Execution Time (ms) | 264407 | 264908 | ||
352x288 P420 8-bit | ./SvtHevcEncApp -i akiyo_cif.y4m -b 0.265 -n 100000 | Average Speed (FPS) | 539.88 | 457.51 |
Total Encoding Time (ms) | 185228 | 218576 | ||
Total Execution Time (ms) | 185409 | 218760 |
As for some performance drop "side effect" with small resolution (such as 352x288), it's because only the PPCS and PA Ref objects which are not referenced by other picture(s) any more can be dequeued from their corresponding FIFOs, rather than that the ones which are still referenced by other picture(s) could be dequeued without this PR applied. So by enlarging the length of those FIFOs, the encoding performance can be increased to ~600 FPS:
diff --git a/Source/Lib/Codec/EbEncHandle.c b/Source/Lib/Codec/EbEncHandle.c
index 3ca232a..853f9d3 100644
--- a/Source/Lib/Codec/EbEncHandle.c
+++ b/Source/Lib/Codec/EbEncHandle.c
@@ -1960,10 +1960,10 @@ void LoadDefaultBufferConfigurationSettings(
sequenceControlSetPtr->tileGroupRowCountArray[5] = tileGroupRowCount;
//#====================== Data Structures and Picture Buffers ======================
- sequenceControlSetPtr->pictureControlSetPoolInitCount = inputPic;
+ sequenceControlSetPtr->pictureControlSetPoolInitCount = inputPic * 2;
sequenceControlSetPtr->pictureControlSetPoolInitCountChild = MAX(4, coreCount / 6);
sequenceControlSetPtr->referencePictureBufferInitCount = inputPic;//MAX((EB_U32)(sequenceControlSetPtr->inputOutputBufferFifoInitCount >> 1), (EB_U32)((1 << sequenceControlSetPtr->staticConfig.hierarchicalLevels) + 2));
- sequenceControlSetPtr->paReferencePictureBufferInitCount = inputPic;//MAX((EB_U32)(sequenceControlSetPtr->inputOutputBufferFifoInitCount >> 1), (EB_U32)((1 << sequenceControlSetPtr->staticConfig.hierarchicalLevels) + 2));
+ sequenceControlSetPtr->paReferencePictureBufferInitCount = inputPic * 2;//MAX((EB_U32)(sequenceControlSetPtr->inputOutputBufferFifoInitCount >> 1), (EB_U32)((1 << sequenceControlSetPtr->staticConfig.hierarchicalLevels) + 2));
sequenceControlSetPtr->reconBufferFifoInitCount = sequenceControlSetPtr->referencePictureBufferInitCount;
//#====================== Inter process Fifos ======================
As for some performance drop "side effect" with small resolution (such as 352x288), it's because only the PPCS and PA Ref objects which are not referenced by other picture(s) any more can be dequeued from their corresponding FIFOs, rather than that the ones which are still referenced by other picture(s) could be dequeued without this PR applied. So by enlarging the length of those FIFOs, the encoding performance can be increased to ~600 FPS:
diff --git a/Source/Lib/Codec/EbEncHandle.c b/Source/Lib/Codec/EbEncHandle.c index 3ca232a..853f9d3 100644 --- a/Source/Lib/Codec/EbEncHandle.c +++ b/Source/Lib/Codec/EbEncHandle.c @@ -1960,10 +1960,10 @@ void LoadDefaultBufferConfigurationSettings( sequenceControlSetPtr->tileGroupRowCountArray[5] = tileGroupRowCount; //#====================== Data Structures and Picture Buffers ====================== - sequenceControlSetPtr->pictureControlSetPoolInitCount = inputPic; + sequenceControlSetPtr->pictureControlSetPoolInitCount = inputPic * 2; sequenceControlSetPtr->pictureControlSetPoolInitCountChild = MAX(4, coreCount / 6); sequenceControlSetPtr->referencePictureBufferInitCount = inputPic;//MAX((EB_U32)(sequenceControlSetPtr->inputOutputBufferFifoInitCount >> 1), (EB_U32)((1 << sequenceControlSetPtr->staticConfig.hierarchicalLevels) + 2)); - sequenceControlSetPtr->paReferencePictureBufferInitCount = inputPic;//MAX((EB_U32)(sequenceControlSetPtr->inputOutputBufferFifoInitCount >> 1), (EB_U32)((1 << sequenceControlSetPtr->staticConfig.hierarchicalLevels) + 2)); + sequenceControlSetPtr->paReferencePictureBufferInitCount = inputPic * 2;//MAX((EB_U32)(sequenceControlSetPtr->inputOutputBufferFifoInitCount >> 1), (EB_U32)((1 << sequenceControlSetPtr->staticConfig.hierarchicalLevels) + 2)); sequenceControlSetPtr->reconBufferFifoInitCount = sequenceControlSetPtr->referencePictureBufferInitCount; //#====================== Inter process Fifos ======================
Addressed the "side effect" by commit.
Would it make more sense to add a null pointer check to EbReleaseObject?
It's supposed that those PPCS and PA Ref object's corresponding wrappers exist in the EncDec and RateControl processes. Thanks!
./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -intra-period 100 -scd 0 -rc 0 -q 22 -n 1000 -b 0.265
random hang.
I wonder if it can be fixed by this PR too.
./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.yuv -w 1920 -h 1080 -encMode 0 -intra-period 100 -scd 0 -rc 0 -q 22 -n 1000 -b 0.265
random hang. I wonder if it can be fixed by this PR too.
Thanks for capturing this!
At first, this is another (fixed, not random) hang issue which hasn't been addressed by this PR, or isn't regression caused by this PR.
And here is the hang pattern due to encMode 0 and intra period:
Anyway, it's another hang issue we need to look at.
Hi @Austin-Hu , the performance data in above table is without the fixes of fifo size, right? With the fix of PPCS and PA fifo size, cif fps goes up to 600?
Hi @Austin-Hu , the performance data in above table is without the fixes of fifo size, right? With the fix of PPCS and PA fifo size, cif fps goes up to 600?
Yes, I got the dropped performance data before figuring out this commit. And with that commit applied as well, there isn't performance gap with the whole PR applied or not.
Memory: | 1080p | cif | |
---|---|---|---|
master allocated memory | 2.29G | 899.14 MB | |
This PR allocated memory | 2.83G | 940MB |
Speed: Total execution time is not affected much.
Both cif and 1080P fps gains at faster encmode.
This is an old issue since v1.2.0. Overall it's due to the incorrect life cycle management of the PPCS, PA RefPic, PCS objects in different kernels (through ResourceCoordination to EncDec, and PictureManger & RateControl with feedbacks from EncDec and Packetization separately).
When the random hang happens, the former process is:
So the solution is: (Note: we need to manage the life cycle of PPCS, PA RefPic and PCS RefPic objects separately, because they're operated in pictureDecisionPaReferenceQueue of PictureDecision and referencePictureQueue of PictureManager, besides of other kernels.)
With the changes, the live count for the PPCS and PA Ref objects in each kernel stage are (numbers in brackets means the current picture is referenced by other picture(s)):
Obj | RC | PD | EncDec | RateControl PPCS: | 2 | (+1) | 1 (- 1) | 0 PA Ref: | 2 | (+1) | 1 (- 1) | 0
compared with original implementation:
Obj | RC | IRC | PD | RateControl PPCS: | 3 | 2 | 1 (+ 1 - 1) | 0 PA Ref: | 2 | 1 | 0 (+ 1 - 1) |
Signed-off-by: Austin Hu austin.hu@intel.com
Fixes #474 .