cisco / openh264

Open Source H.264 Codec
BSD 2-Clause "Simplified" License
5.57k stars 1.8k forks source link

The estimation on number of bits for I16x16 mode is incorrect? #3030

Open rogerzhou-github opened 6 years ago

rogerzhou-github commented 6 years ago

I think the way we estimate the number of bits for I16 mode can be improved. There are 4 prediction modes (Vertical, Horizontal, DC and Plan). For baseline profile, the type is encoded using EG coding. In current implementation, it is assumed the type value (0 ~ 3) is encoded directly into bitstream as “1”, “010”, “011”, and “00100”, which take 1, 3, 3, and 5 bits correspondingly . However, according to “7.4.5 Macroblock layer semantics” in H.264 recommendation, the prediction type is jointly encoded with two other values, the Luma CBP and Chroma CBP. In P Slices, an offset of 5 is added to the mb_type value of Intra to accommodate mb types of P Block, taking even more bits to encode the mb_type of I16x16. Furthermore, when a macroblock is encoded as Intra, chroma prediction mode is encoded, which takes 1~5 bits. The actual number of bits encoded into bitstream varies from 2 ~ 14 bits, while in current implementation it is estimated 1 ~ 4 bits. As a result, the encoder favors intra modes since the number of bits for encoding intra modes is under estimated. It causes problem at low bitrate where P_Skip mode is more efficient.

One of optional fix is provided. Our test shows that with the fix 3.0% bits can be saved and there is 4.0% speed up on arm phones.

image

The fix can be:

diff --git a/codec/encoder/core/src/svc_base_layer_md.cpp b/codec/encoder/core/src/svc_base_layer_md.cpp index 5acc2d92..8e3fa005 100644 --- a/codec/encoder/core/src/svc_base_layer_md.cpp +++ b/codec/encoder/core/src/svc_base_layer_md.cpp @@ -390,8 +390,15 @@ int32_t WelsMdI16x16 (SWelsFuncPtrList pFunc, SDqLayer pCurDqLayer, SMbCache* pFunc->pfGetLumaI16x16Pred[iBestMode] (pDst, pDec, iLineSizeDec); } iIdx = 1;

ruil2 commented 6 years ago

@rogerzhou-github thanks for your details analysis. we will probe this and check whether it is an issue.

rogerzhou-github commented 6 years ago

@ruil2 Thanks! Please let me know if you need any more information.

huili2 commented 6 years ago

Seems fine to me, with two questions:

  1. your new code is a little complex with too many branches. Is there any optimization method with comparable RD?
  2. How about the RD-complexity data for normal rate (like small QP case). Do you have such reports? Thanks again for so much detailed contribution on this~
rogerzhou-github commented 6 years ago

Agree that the fix is confusing. I tried to fix the "R" in D + lumbdaR, but seems the number of bits is hard coded in assembly code. To avoid modify assembly code, I removed the rlumbda part by

int32_t i16_pred_sad = iBestCost - (iBestMode == 3 ? 4 : iBestMode == 0 ? 0 : 2) * iLambda;

I think we can skip this step: just add some "r*lumbda" on top of current cost. There will be no big difference.

I will provide another fix and rerun the test. I will also cover larger bit rate range.

TriMoon commented 6 years ago

@rogerzhou-github Next time when you post diffs please use unified diffs diff -u and paste using syntax highlighted code on github like:

```diff
<your diff output>
```

to get output like:

--- a/codec/encoder/core/src/svc_base_layer_md.cpp
+++ b/codec/encoder/core/src/svc_base_layer_md.cpp
@@ -390,8 +390,15 @@ int32_t WelsMdI16x16 (SWelsFuncPtrList* pFunc, SDqLayer* pCurDqLayer, SMbCache*
pFunc->pfGetLumaI16x16Pred[iBestMode] (pDst, pDec, iLineSizeDec);
}
iIdx = 1;
+ bla bla
- blabla

It will make it much nicer to look at for all and use as patch for the devs :wink:

huili2 commented 1 year ago

any update on this? Like latest BD-rate or BD-PSNR with the modification?