EricssonResearch / scream

SCReAM - Mobile optimised congestion control algorithm
BSD 2-Clause "Simplified" License
174 stars 53 forks source link

Periodic rate drop with CWND control enabled. #45

Open mstunda opened 2 years ago

mstunda commented 2 years ago

Hello, I am having an issue with periodic excessive rate drop when trying to use scream with ScreamTx::openWindow = false

For verification, I have set up both the sender and receiver on two machines in a local subnet, to eliminate network bandwidth variations. I have several cameras with separate streams using the prioritization functionality.

Stable with ScreamTx::openWindow = true

The transmission is very stable when running with ScreamTx::openWindow = true

Below is the output of a little statistic that I am collecting on ScreamTx::Stream::updateTargetBitrate() to visualise the incidence of different updating scenarios. I print it every 25k calls of said function.

At the bottom (of this image) is a "histogram" of ScreamTx::getTotalTargetBitrate(). You can see that in this setup, in the span of 15M calls, the bitrate remains near the maximum values limited by maximum bitrates. This was measured for about 30+ minutes.

Stats - window_open--true__l4s_enabled--false

Dipping with ScreamTx::openWindow = false

When enabling CWND control, the target bitrate stays at the same stable maximum level most of the time, but very regularly dips. Sometimes the dip reaches a medium low level and recovers after a short moment, sometimes it dips all the way to minimum bitrates and struggles to recover for tens of seconds. After half an hour or so, it gets stuck in clearing the RtpQueue and never recovers at all.

Here is the same statistic with CWND control. It can be seen that the bitrate has been all over the place, and stayed near minimum 9% of the time. It can also be seen that many clearings of RtpQueue have been performed due to excessive delay (not Loss or Ecn events), which, I guess is due to cwnd dropping to the minimum (5000) and packets not getting sent thereby. Under normal operation, i observed the cwnd value to be around 130'000. Stats - window_open--FALSE__l4s_enabed--false - DEEP DIP 1

Here is an example of the bitrates dropping a little bit for a short while and then going all the way down for a longer time. NW_load - window_open--FALSE__l4s_enabed--false - DEEP DIP 1

Question

Are there any obvious things I should try changing or any other values I should monitor to get a clue?

These coefficients are currently set to: kLossBeta = 0.8 kEcnCeBeta = 0.9

IngJohEricsson commented 2 years ago

Hi Can you inform which kind of application that you are using ?

Regards Ingemar

From: mstunda @.> Sent: Tuesday, 17 May 2022 16:42 To: EricssonResearch/scream @.> Cc: Subscribed @.***> Subject: [EricssonResearch/scream] Periodic rate drop with CWND control enabled. (Issue #45)

Hello, I am having an issue with periodic excessive rate drop when trying to use scream with ScreamTx::openWindow = false

For verification, I have set up both the sender and receiver on two machines in a local subnet, to eliminate network bandwidth variations. I have several cameras with separate streams using the prioritization functionality.

Stable with ScreamTx::openWindow = true

The transmission is very stable when running with ScreamTx::openWindow = true

Below is the output of a little statistic that I am collecting on ScreamTx::Stream::updateTargetBitrate() to visualise the incidence of different updating scenarios. I print it every 25k calls of said function.

At the bottom is a histogram of ScreamTx::getTotalTargetBitrate(). You can see that in this setup, in the span of 15M calls, the bitrate remains near the maximum values limited by maximum bitrates. This was measured for about 30+ minutes.

[Stats - window_open--true__l4s_enabled--false]https://user-images.githubusercontent.com/25366469/168829755-a0735368-6a18-4156-a90d-5c41a7eee5ff.png

Dipping with ScreamTx::openWindow = false

When enabling CWND control, the target bitrate stays at the same stable maximum level most of the time, but very regularly dips. Sometimes the dip reaches a medium low level and recovers after a short moment, sometimes it dips all the way to minimum bitrates and struggles to recover for tens of seconds. After half an hour or so, it gets stuck in clearing the RtpQueue and never recovers at all.

Here is the same statistic with CWND control. It can be seen that the bitrate has been all over the place, and stayed near minimum 9% of the time. It can also be seen that many clearings of RtpQueue have been performed due to excessive delay (not Loss or Ecn events), which, I guess is due to cwnd dropping to the minimum (5000) and packets not getting sent thereby. Under normal operation, i observed the cwnd value to be around 130'000. [Stats - window_open--FALSE__l4s_enabed--false - DEEP DIP 1]https://user-images.githubusercontent.com/25366469/168835079-8669e84e-a468-430b-983d-ab9c792783ba.png

Here is an example of the bitrates dropping a little bit for a short while and then going all the way down for a longer time. [NW_load - window_open--FALSE__l4s_enabed--false - DEEP DIP 1]https://user-images.githubusercontent.com/25366469/168836117-6eac5591-c040-4da6-b1c3-41187cc684d3.png

Question

Are there any obvious things I should try changing or any other values I should monitor to get a clue?

These coefficients are currently set to: kLossBeta = 0.8 kEcnCeBeta = 0.9

— Reply to this email directly, view it on GitHubhttps://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-12075e102fa748dd&q=1&e=c2033fee-b255-4a1f-b71c-a908e41e4bab&u=https%3A%2F%2Fgithub.com%2FEricssonResearch%2Fscream%2Fissues%2F45, or unsubscribehttps://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-ef96384714ddf418&q=1&e=c2033fee-b255-4a1f-b71c-a908e41e4bab&u=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACRZ2GDUSCIFHMMIVFQTVVTVKOVZ3ANCNFSM5WFFYXRQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

mstunda commented 2 years ago

Hello, Not sure I fully understand the question, but the ScreamTx functions are called from a custom sender class in our software. It is based off the example sender but modified for multiple streams.

IngJohEricsson commented 2 years ago

Hi It is difficult to say what may be wrong. Is the RTCP feedback received correctly?. Is the CPU load reasonably low, asking because possible use of software encoders like x264 can give a high CPU load.

mstunda commented 2 years ago

The 8 cores are loaded rather evenly with an average load around 30% (Tx machine) and 20% (Rx machine) at full bitrate.

Regarding the feedback - it is sent through a reliable channel and it can be seen (3rd and 4th line) that there are almost no losses reported over the long operation period. Maybe the bitrate could be triggered by some kind of positive feedback between cwnd and rtpQueue delay?

The last image in the original question is very characteristic and I have seen this load pattern many times. It seems that there is a small dip recovered from with a slight overshoot (over the previous stable level), followed by a complete dip to minimum bitrate.

I should mention that I am using 8 camera streams.