[ICML workshop, 2019] Real-world Video Adaptation with Reinforcement Learning

Problem: Video gets stuck and rebuffering...

When Video Rate > Capacity, the video gets stuck and gets into a rebuffering period.

Choose available periods adaptively depending on the network and playback buffer condition.

The future is unknown...

Past throughput observation cannot predict the coming network condition...

User side:

users don't want higher/extra bitrate => cuz it will deplete the playback buffer and finally result in videos' getting stuck.

So ABR has to wait for a while to adapt this noisy cloud of future possibilities.

So we need to plan for the future for a better user experience.

Input: Observation@{Bandwidth, Current BitRate, Buffer} Output: Next bitrate.

Contrib:

First network control system using modern DRL.
Delivers 12~25% QoE, with 10~30% less buffering than the previous ABR algorithm.
Tailors ABR decisions for different network conditions in a data-driven approach.

All these are based on fixed heuristics based on designers' insights.

=> Simplified inaccurate model leads to suboptimal performance.

MPC: conservative throughput prediction

When doing the evaluation, we can compare the algorithms with the offline optimal result.

Does xxx generalize?