The estimation on number of bits for I16x16 mode is incorrect?

rogerzhou-github commented 6 years ago

I think the way we estimate the number of bits for I16 mode can be improved. There are 4 prediction modes (Vertical, Horizontal, DC and Plan). For baseline profile, the type is encoded using EG coding. In current implementation, it is assumed the type value (0 ~ 3) is encoded directly into bitstream as “1”, “010”, “011”, and “00100”, which take 1, 3, 3, and 5 bits correspondingly . However, according to “7.4.5 Macroblock layer semantics” in H.264 recommendation, the prediction type is jointly encoded with two other values, the Luma CBP and Chroma CBP. In P Slices, an offset of 5 is added to the mb_type value of Intra to accommodate mb types of P Block, taking even more bits to encode the mb_type of I16x16. Furthermore, when a macroblock is encoded as Intra, chroma prediction mode is encoded, which takes 1~5 bits. The actual number of bits encoded into bitstream varies from 2 ~ 14 bits, while in current implementation it is estimated 1 ~ 4 bits. As a result, the encoder favors intra modes since the number of bits for encoding intra modes is under estimated. It causes problem at low bitrate where P_Skip mode is more efficient.

One of optional fix is provided. Our test shows that with the fix 3.0% bits can be saved and there is 4.0% speed up on arm phones.

The fix can be:

diff --git a/codec/encoder/core/src/svc_base_layer_md.cpp b/codec/encoder/core/src/svc_base_layer_md.cpp index 5acc2d92..8e3fa005 100644 --- a/codec/encoder/core/src/svc_base_layer_md.cpp +++ b/codec/encoder/core/src/svc_base_layer_md.cpp @@ -390,8 +390,15 @@ int32_t WelsMdI16x16 (SWelsFuncPtrList pFunc, SDqLayer pCurDqLayer, SMbCache* pFunc->pfGetLumaI16x16Pred[iBestMode] (pDst, pDec, iLineSizeDec); } iIdx = 1;

iBestCost += iLambda;
} else {
ifdef INTRA_16X16_BITS_EST_FIX
//The number of bits for encoding mb_type of I16x16 depends on pred mode, slice_type, CBP luma and CBP chroma
//When a macroblock is encoded as Intra, chroma prediction mode is encoded as well, which takes 1~3 bits in CAVLC
int32_t i16_pred_sad = iBestCost - (iBestMode == 3 ? 4 : iBestMode == 0 ? 0 : 2) * iLambda;
iBestCost = i16_pred_sad + (i16_pred_sad > iLambda ? 8 : 6) * iLambda; //assumes CBP != 0 when sad > 1 lambda
else
iBestCost += iLambda;
endif
} else { iBestMode = kpAvailMode[0]; for (i = 0; i < iAvailCount; ++ i) { iCurMode = kpAvailMode[i]; @@ -401,6 +408,15 @@ int32_t WelsMdI16x16 (SWelsFuncPtrList pFunc, SDqLayer pCurDqLayer, SMbCache pFunc->pfGetLumaI16x16Pred[iCurMode] (pDst, pDec, iLineSizeDec); iCurCost = pFunc->sSampleDealingFuncs.pfMdCost[BLOCK_16x16] (pDst, 16, pEnc, iLineSizeEnc); iCurCost += iLambda (BsSizeUE (g_kiMapModeI16x16[iCurMode]));
ifdef INTRA_16X16_BITS_EST_FIX
int mb_type_offset = 5;
if (iCurCost > iLambda ) {
mb_type_offset += 21;
}
iCurCost += iLambda * (g_kuiGolombUELength[g_kiMapModeI16x16[iCurMode] + mb_type_offset] + 3) ;
else
iCurCost += iLambda * (BsSizeUE (g_kiMapModeI16x16[iCurMode]));
endif

if (iCurCost < iBestCost) { iBestMode = iCurMode; iBestCost = iCurCost;

ruil2 commented 6 years ago

@rogerzhou-github thanks for your details analysis. we will probe this and check whether it is an issue.

rogerzhou-github commented 6 years ago

@ruil2 Thanks! Please let me know if you need any more information.

huili2 commented 6 years ago

Seems fine to me, with two questions:

your new code is a little complex with too many branches. Is there any optimization method with comparable RD?
How about the RD-complexity data for normal rate (like small QP case). Do you have such reports? Thanks again for so much detailed contribution on this~

rogerzhou-github commented 6 years ago

Agree that the fix is confusing. I tried to fix the "R" in D + lumbdaR, but seems the number of bits is hard coded in assembly code. To avoid modify assembly code, I removed the rlumbda part by

int32_t i16_pred_sad = iBestCost - (iBestMode == 3 ? 4 : iBestMode == 0 ? 0 : 2) * iLambda;

I think we can skip this step: just add some "r*lumbda" on top of current cost. There will be no big difference.

I will provide another fix and rerun the test. I will also cover larger bit rate range.

TriMoon commented 6 years ago

@rogerzhou-github Next time when you post diffs please use unified diffs diff -u and paste using syntax highlighted code on github like:

```diff
<your diff output>
```

to get output like:

--- a/codec/encoder/core/src/svc_base_layer_md.cpp
+++ b/codec/encoder/core/src/svc_base_layer_md.cpp
@@ -390,8 +390,15 @@ int32_t WelsMdI16x16 (SWelsFuncPtrList* pFunc, SDqLayer* pCurDqLayer, SMbCache*
pFunc->pfGetLumaI16x16Pred[iBestMode] (pDst, pDec, iLineSizeDec);
}
iIdx = 1;
+ bla bla
- blabla

It will make it much nicer to look at for all and use as patch for the devs :wink:

huili2 commented 1 year ago

any update on this? Like latest BD-rate or BD-PSNR with the modification?

cisco / openh264

The estimation on number of bits for I16x16 mode is incorrect? #3030

ifdef INTRA_16X16_BITS_EST_FIX

else

endif

ifdef INTRA_16X16_BITS_EST_FIX

else

endif