algesten / str0m

A Sans I/O WebRTC implementation in Rust.
MIT License
317 stars 49 forks source link

BWE increases slowly when audio packets are sent before video packets #506

Closed giangndm closed 5 months ago

giangndm commented 5 months ago

This PR have some changes:

For more clearly, I will explain what issues lead to this PR.

Problem

When an audio stream starts before a video stream, the Bitrate Estimator (BWE) calculates the initial bitrate based on the audio packets. This results in a very low estimated bitrate, typically around 40kbps. When the video stream starts, the BWE struggles to adapt to the increased bandwidth requirements, leading to a slow increase in bitrate. This can cause poor video quality.

        let sample_estimate_bps = sample_estimate.as_f64();
        let estimate_bps = estimate.as_f64();
        // Define the sample uncertainty as a function of how far away it is from the
        // current estimate. With low values of uncertainty_symmetry_cap_ we add more
        // uncertainty to increases than to decreases. For higher values we approach
        // symmetry.
        let sample_uncertainty =
            scale * (estimate_bps - sample_estimate_bps).abs() / (estimate_bps.max(25_000.0));
        let sample_var = sample_uncertainty.powf(2.0);

        // Update a bayesian estimate of the rate, weighting it lower if the sample
        // uncertainty is large.
        // The bitrate estimate uncertainty is increased with each update to model
        // that the bitrate changes over time.
        let pred_bitrate_estimate_var = self.estimate_var + 5.0;
        let mut new_estimate = (sample_var * estimate_bps
            + pred_bitrate_estimate_var * sample_estimate_bps)
            / (sample_var + pred_bitrate_estimate_var);
        new_estimate = new_estimate.max(ESTIMATE_FLOOR.as_f64());
        self.estimate = Some(Bitrate::bps(new_estimate.ceil() as u64));
        self.estimate_var =
            (sample_var * pred_bitrate_estimate_var) / (sample_var + pred_bitrate_estimate_var);

https://github.com/algesten/str0m/blob/a3e0f13744ef8b0681bd34b97298dbe72085878a/src/packet/bwe/acked_bitrate_estimator.rs#L76-91

In my local testing, I observed that the sample_var value was around 2000 and the self.estimate_var value was around 20. As a result, the BWE increased very slowly. For instance, if the estimated bitrate was 400 and the previous bitrate was 40, the BWE would only increase to 43.6, which is a mere 1% increase. After my configuration of a 2-second warm-up time ends, the video bitrate drops dramatically to around 80 kbps, causing video freezing and very poor quality.

Solution

This PR introduces a BWE reset API, which allows the BWE to be reset when the server starts sending video packets after an initial period of audio-only packets. This enables the BWE to quickly adapt to the changed bandwidth requirements, ensuring a smoother and more accurate bitrate estimation.

algesten commented 5 months ago

Oh. This looks really good! Thanks!

@k0nserv can you review since this is your domain?

giangndm commented 5 months ago

@k0nserv I tried setting the desired bitrate to a minimum of 800,000 bps, and force estimate bitrate to 800_000 bps during the warm-up state, but it didn't work. After the warm-up time, the stream bitrate would drop to very low bitrate, as you can see in the screen capture below. (Note that I tested with a normal stream, not Simulcast, which requires controlling the sender side with REMB.)

image

I think disable BWE when we have only audio data is the best way but I don't know how to do with str0m, I see str0m only have api to enable or disable it at create time and control current_bitrate and desired_bitrate after that. Can you show me how to do it with str0m?

k0nserv commented 5 months ago

I tried setting the desired bitrate to a minimum of 800,000 bps, and force estimate bitrate to 800_000 bps during the warm-up state, but it didn't work. After the warm-up time, the stream bitrate would drop to very low bitrate, as you can see in the screen capture below. (Note that I tested with a normal stream, not Simulcast, which requires controlling the sender side with REMB.)

That graph is showing byte sent from a browser perspective right? If so, that's not related to str0m's BWE implementation, but the browser's BWE implementation. I see that it drops after the initial ramp up, but then it seems to recover fine after that. Can you explain in more detail the problem you are seeing?

I think disable BWE when we have only audio data is the best way but I don't know how to do with str0m, I see str0m only have api to enable or disable it at create time and control current_bitrate and desired_bitrate after that. Can you show me how to do it with str0m?

I went and looked at our code and I was slightly wrong. What we do is that when we start sending the very first video track we pretend that that estimated bitrate is 1.5Mbit/s which means we immediately allocate a high quality layer from our simulcast selection. If this proves to be wrong the BWE system will adjust us down quickly and we downgrade to a medium or even low layer in response

giangndm commented 5 months ago

@k0nserv Sorry for not explaining earlier, but the graph is from the sender side, which is controlled directly by REMB using the str0m BWE output.

I agree that we need to start with a high bitrate during the warm-up state, and then use the BWE output afterwards. This is the same approach I take in my SFU. When video and audio start at the same time, it works well.

image

In the case where audio starts before video, the video initially looks good, but then the quality drops to become very noisy. After that, it increases again, and around 9 seconds later, it increases back up to 1Mbps.

image

In the case where audio starts before video, but I reset the BWE after the video starts, with a 1-second delay (and a warm-up time stills is 2 seconds), the bitrate looks like it does in the normal case.

image

xnorpx commented 5 months ago

I'd also be curious about how @xnorpx handles this.

We are actually still playing code golf, but we just recently started turning on BWE to observe it's behavior.

Based on the limited testing we have done this is probably true. The estimate will closely align to the content (a little above in happy cases) and since there is no probing it's not going to increase much higher than the currently sent content. Once video is added the pacer will go make the video experience quite jerky until the estimate goes up.

I think this API makes sense and we probably would start using it directly. (We really need probing but that for another day)

algesten commented 5 months ago

@giangndm not sure how REMB comes into the mix here. AFAIK it's the old BWE (receiver based rather than sender based), that is not in active use in libWebRTC.

giangndm commented 5 months ago

@algesten In this test I was used both TWCC and REMB. Viewer stream bitrate is controlled by TWCC, server handle how much data can be sent with BWE TWCC then request Streamer client over REMB to limit Streamer bandwidth. I also double check Bwe(Twcc) bitrate output and it same with above screenshot.

k0nserv commented 5 months ago

Sorry for not explaining earlier, but the graph is from the sender side, which is controlled directly by REMB using the str0m BWE output.

In this test I was used both TWCC and REMB. Viewer stream bitrate is controlled by TWCC, server handle how much data can be sent with BWE TWCC then request Streamer client over REMB to limit Streamer bandwidth. I also double check Bwe(Twcc) bitrate output and it same with above screenshot.

Right, so you aren't using simulcast and have a situation like A -> [SFU] –(BWE)-> B you want to limit the send bitrate of A based on str0m's BWE estimate towards B?

giangndm commented 5 months ago

@k0nserv Yes, in this test, it works like this.

I also have a client that works with simulcast, and it has the same issues. Initially, it starts with the highest layer, but after the warm-up period, it drops to the lowest layer, and then increases back to the highest layer after about 9-10 seconds (possibly because Chrome needs time to prepare the video codec, which starts about 0.5-1 seconds after the audio). The issue does not occur if audio and video are sent at the same time.

algesten commented 5 months ago

@giangndm thanks!

What would the 1 line summary be for the changelog?

Or 2 lines, if there are two separate things addressed here.

giangndm commented 5 months ago

@algesten Maybe it is: