Large encoder overshoots at low bitrates/framerates, for large ~static content

GoogleCodeExporter commented 8 years ago

At low bitrates/framerates, for large static content, encoder can exhibit large 
overshoots. Example input is here: 
/home/marpan/overshoot_issue/test_1980_1184.yuv
(the clip has the first ~200 frames static, from ~200 to ~350 there is a small 
video playing, and then static from ~350 to ~500)

Encoding was run with the settings:
./vpxenc --lag-in-frames=0 --error-resilient=0/1--target-bitrate=100 
--kf-min-dist=3000 --kf-max-dist=3000 --cpu-used=-6 --fps=5000/1000 
--static-thresh=1 --token-parts=1 --end-usage=cbr --min-q=2 --max-q=56 
--undershoot-pct=100 --overshoot-pct=15 --buf-sz=1000 --buf-initial-sz=500 
--buf-optimal-sz=600 --max-intra-rate=900 --resize-allowed=0 --drop-frame=0 
--passes=1 --rt --noise-sensitivity=0 --tune=psnr -w 1980 -h 1184 
test_1980_1184.yuv -o out.webm

4 plots are attached, corresponding to --error-resilient=0/1 and an internal 
codec setting. The plots show the actual/encoded frame size vs the frame 
number, with the constant line equal to per-frame-bandwidth.

The plots correspond to:
1. er0.eps: error_resilient = 0.
2. er0_gfboost0.eps: error_resilient = 0 and the default gf_boosting is turned 
off for sequence (i.e., cpi->max_gf_interval = DEFAULT_GF_INTERVAL ~ 1000).
3. er1_refresh0.eps: error_resilient = 1, with cyclic refresh off.
4. er1_refresh1.eps: error_resilient = 1, with cyclic refresh set to 5%.

The 4th case is reasonably stable, but the first 3 cases show way too much 
overshoot. Most of this overshoot instances correspond to significant changes 
in Q (and RD multipliers) frame-by-frame. Bounding the delta/change of these 
parameters is one option.

Original issue reported on code.google.com by marpan@google.com on 20 Feb 2013 at 12:46

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by marpan@google.com on 20 Feb 2013 at 12:51

Attachments:

er0_gfboost0.eps

GoogleCodeExporter commented 8 years ago

Original comment by marpan@google.com on 20 Feb 2013 at 5:34

Changed state: Assigned

GoogleCodeExporter commented 8 years ago

Sorry we didnt do this earlier. These are our graphs (X-Axis=Frames) of us 
screen capturing a scrolling webpage in chrome.  Can you please explain how you 
got the er1_refresh5 graph since that looks really good (way better than 
anything else). Thanks!

The graphs attached are with two different quantizer ranges. John had 
recommended that we try ROI maps but we found  that even with the q63-q63 
(worst) option the spikes are still present. 

Even a work-around would work for us at this point since this makes the 
screensharing experience considerably worse than our x264 counterpart products. 
Thanks a lot!

Original comment by fa...@screenhero.com on 22 Mar 2013 at 6:30

Attachments:

GoogleCodeExporter commented 8 years ago

To do some homework I found the following in onyx_if.h but I cant seem to look 
up any documentation anywhere:

    cpi->cyclic_refresh_mode_enabled = cpi->oxcf.error_resilient_mode;
    cpi->cyclic_refresh_mode_max_mbs_perframe = (cpi->common.mb_rows * cpi->common.mb_cols) / 5;
    cpi->cyclic_refresh_mode_index = 0;
    cpi->cyclic_refresh_q = 32;

I'm assuming setting to 5% involves one of those guys

Thanks!

Original comment by fa...@screenhero.com on 22 Mar 2013 at 6:40

GoogleCodeExporter commented 8 years ago

Hi,
  The er1_refresh5 graph was obtained from settings:
  --error-resilient=1 (at command line), and the internal setting: 
  cpi->cyclic_refresh_mode_max_mbs_perframe = (cpi->common.mb_rows * cpi >common.mb_cols) / 20;

  (/ 10 will give you 10%, / 5 for 20%, etc)

This was an alternative to the default golden frame boosting (that is done when 
error-resilient=0), to bring the quality up from initial key frame.  
The cyclic refresh uses segmentation to change QP and/or loop filter for a set 
of macroblocks.

It will help to reduce severe overshoots that may arise even for stationary 
content, as the encoder settles to steady state.

But for the case of scrolling or slide switching the encoder will still 
overshoot, even if we max out qp (i.e, qp_min=qp_max=63), as you mentioned. I 
haven't found a good solution to this issue so far.

Original comment by marpan@google.com on 22 Mar 2013 at 9:30

GoogleCodeExporter commented 8 years ago

Hi,
I guessed that part based on your original post and added that in - it actually 
helps a LOT! with error-resilience=1 , the /20  makes the overshoots way more 
bearable. Acutally - it doesnt overshoot much at all for me right now - a tiny 
bit of overshoot is fine,  2-3x spikes are not.  Does it till spike for you? 
for me it seems to be behaving way better.

Also - my  buf sizes are tiny - its 30,100,200 (initial,optimal,normal)  - I 
actually dont understand how those buf sizes work but in my experience the tiny 
buf sizes makes vp8 switch to lower quality much more quickly than 400,600,1000 
which seems to be the webrtc default.

Thank you

Original comment by fa...@screenhero.com on 22 Mar 2013 at 10:22

GoogleCodeExporter commented 8 years ago

Hi,
   Glad the setting helped you. I still get overshoots when moving slides/scrolling,
   but that is expected to some extent, and there may be other internal adjustments 
   that can help here. I expect to do some more tests next week.

   -buf_intial_sz control the target size of the initial key frame 
    (target size =  1/4 buf_initial_sz * target_bandwidth)

   -buf_sz and buf_optimal_sz control the target size of the subsequent frames 
    (via a leaky bucket model for the (decoder) buffer level). So smaller values for 
    these parameters will force the target frame size to deviate less from the 
    per-frame-bandwidth, which helps to reduce rate fluctuation. Effectively will 
    make codec respond quicker to bitrate changes for example.

    But note that these parameters control are the "target" size: the actual size 
    can still be quite different, as you know. Actual size depends on the selected 
    QP for that frame, so adjusting min-q/max-q helps here, as you have seen.

Original comment by marpan@google.com on 22 Mar 2013 at 11:16

GoogleCodeExporter commented 8 years ago

You are right - the per frame bandwidth still spikes. Good thing is that on 
average the per second bitrate seems to behave very well. However, this may be 
more of a visual thing but due to the spike the decode playback seems to 'slow 
down' and jitter - this is because we use webrtc's transmission smoothing code 
and if a spike does happen the transmission gets slowed down.

Is there anyway we can manually force quality to become terrible (and hence 
manually clamp down the next frame size). We know when scrolling/sliding will 
happen. I just dont know where in VP8's code we can force this 'super bad 
quality' - any pointers on that? Since that is a good 'duct tape' fix to our 
problem.

Thank you!

Original comment by fa...@screenhero.com on 23 Mar 2013 at 12:28

GoogleCodeExporter commented 8 years ago

Hi,

  If you know when scrolling/sliding happens and want to force lower quality there, 
  you can also try setting the static threshold to very high value 
  (e.g.,  --static-thresh=100000) for those sequence of frames. 
  This will force codec to skip the residual encoding for each macroblock, so you 
  should reduce the bitrate overshooting, but get bad quality.

Original comment by marpan@google.com on 25 Mar 2013 at 9:53

GoogleCodeExporter commented 8 years ago

Thats great! I didnt know static-thresh would work that way but now that I 
think about it  it makes sense! I'm assuming  I can set that on the fly without 
causing a key-frame. I'll try this and let you know!

Original comment by farazrk...@gmail.com on 25 Mar 2013 at 9:57

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

Adding a method to increase static thresh to 100k and taking the cyclic refresh 
down to 2.5% helps manage the spikes quite a bit! Thanks!

However - the bad news is that it STILL spikes randomly sometimes, though using 
your suggestions the probability of that is heavily reduced. Also setting 
static thresh high results in lots of artifacts which seem buggy.

Is there a way to do clamping on frame size somehow as you mentioned so vp8 
takes care of it without us having to do these hacks? I'm willing to spend a 
LOT of time on this if you can guide me / show me where exactly the problem 
lies.

Thanks.

Original comment by farazrk...@gmail.com on 30 Mar 2013 at 6:03

GoogleCodeExporter commented 8 years ago

Quick rehash of what I suggested: it should be possible to get per-screengrab 
window position information from the window manager. Once you know the window 
position (resizing is slightly harder, but can probably be mixed in with this), 
you should be able to generate a list of per-macroblock recommended motion 
vectors to be used as start of the search. Once you have that, the motion 
search should get a lot better (I bet the bitrate spike you get here is because 
most motion vectors aren't very good, so super-high residual which is expensive 
esp. at low Q) while still not being any slower (in fact, it might get faster).

For VP9, you could go one step further and also recommend a list of 
partitionings based on window movement and the border of each of these motion 
fields aligned to block sizes.

Now, all of this ("hinted motion vectors" and "hinted partitionings") aren't 
actually implemented in the source code right now, but they could probably be 
implemented without all too much effort. What you'll need is an API interface 
in vpx/ to send that from application to interface proxy to actual codec (the 
code that lives in vp8/ or vp9/), and then once you have the "hint receiving 
code" in vp9/encoder/ and vp8/encoder/, use these hints as the starting point 
for the motion search (in rdopt.c or vp9_rdopt.c) and the partitioning as basis 
for the partition search in vp9_encodeframe.c.

Hope that helps!

Original comment by rsbul...@gmail.com on 15 May 2013 at 11:17

GoogleCodeExporter commented 8 years ago

Related: I had an alternate CBR mode rate control strategy that avoided these 
overshoots, but it needed more testing before it could land and I got moved 
onto other things. Take a at the sandbox/jkoleszar/new-rate-control branch of 
libvpx, or around 3220 of this link, for the meat of the change:

https://code.google.com/p/webm/source/diff?format=side&repo=libvpx&name=sandbox/
jkoleszar/new-rate-control&path=/vp8/encoder/onyx_if.c&r=90ba80db171e438ca636a92
1b8f166603e279ddf&spec=svn.libvpx.90ba80db171e438ca636a921b8f166603e279ddf

That change should be pretty easy to port to the tip of the tree.

Original comment by jkoles...@google.com on 17 May 2013 at 3:48

GoogleCodeExporter commented 8 years ago

John,
I think the problem is deeper than just the rate control algortihm - since even 
with the Q constrained to 63 these overshoots do still happen. Tuning down 
cyclic refresh and my duct-tape high threshold fix somewhat takes care of this 
- but a better solution is still required.

I'll start digging into motion vectors shortly after Google.io and see what I 
can come up with. 

Thanks!

Original comment by fa...@screenhero.com on 17 May 2013 at 4:35

GoogleCodeExporter commented 8 years ago

Ronald,
Thanks a lot! I've been onyx_if.c as the 'interface' to the actual encoder 
context. Let me examine these files and see if I can understand how they work. 
Is there any documentation (even theoretical) that you could point me to so 
that I understand how this stuff works?

Original comment by fa...@screenhero.com on 17 May 2013 at 4:38

Chen-tao / webm

Large encoder overshoots at low bitrates/framerates, for large ~static content #549