aws / amazon-chime-sdk-js

A JavaScript client library for integrating multi-party communications powered by the Amazon Chime service.
Apache License 2.0
704 stars 475 forks source link

Documented test cases for poor call quality events #2676

Closed everscending closed 1 year ago

everscending commented 1 year ago

Community Note

Tell us about your request

What do you want us to build?

Documented instructions for simulating network conditions that will produce connectionDidSuggestStopVideo and connectionDidBecomePoor call quality events.

Which Amazon Chime SDK or feature area is this request for? amazon-chime-sdk-js

Tell us about the problem you are trying to solve and why is it hard?

We've been using MacOS's Network Link Conditioner to simulate poor network conditions, but we're finding that the connectionDidSuggestStopVideo and connectionDidBecomePoor events fire pretty inconsistently, no matter what combination of bandwidth/packet loss/latency we simulate. Many times the call simply disconnects before we see any events. Our app implements functionality that responds to these events, such as messaging to the user or auto lowering the video bandwidth. The ask here is to document the instructions for simulating network conditions that will produce connectionDidSuggestStopVideo and connectionDidBecomePoor call quality events consistently and reliably. Without this documentation we cannot faithfully test this functionality.

How are you currently solving a problem?

We haven't solved this problem. Trying to test this is extremely frustrating because we cannot get these events to fire consistently.

Additional context

One last thought... surely the amazon-chime-sdk-js dev team have tools and documented test plans for observing these events. If that information could just be shared with the rest of the community that would be greatly appreciated.

simmkyu commented 1 year ago

Summary

The Chime SDK for JavaScript utilizes a single WebRTC metric, the received-audio packet (packetsReceived from the browser's RTCPeerConnection.getStats API), to trigger the connectionDidBecomePoor and connectionDidSuggestStopVideo callbacks. By default, the SDK invokes the callback if the number of packets received is less than 50% of the total expected packets (750).

Can you create a custom Network Link Conditioner profile with the following values to test the connectionDidSuggestStopVideo and connectionDidBecomePoor callbacks? You can also experiment with the values in the Test Results section to simulate various network conditions.

Test Results

In the test environment described below, the Chime SDK for JavaScript consistently triggers the connectionDidBecomePoor and connectionDidSuggestStopVideo callbacks based on the given downlink and uplink values.

Environment

Results

Downlink and Uplink columns: • Bandwidth • Packet Dropped • Delay

Downlink Uplink Time to receive
the callback
Simulated conditions
• 1 kbps
• 99%
• 1000 ms
• 40 mbps
• 0%
• 1 ms
8s Bad download network
• 40 mbps
• 0%
• 1 ms
• 1 kbps
• 99%
• 1000 ms
38s Bad upload network
• 40 mbps
• 50%
• 1 ms
• 40 mbps
• 50%
• 1 ms
17s 50% packet loss + Good bandwidth
• 40 mbps
• 60%
• 1 ms
• 40 mbps
• 60%
• 1 ms
13s 60% packet loss + Good bandwidth
• 40 mbps
• 99%
• 1 ms
• 40 mbps
• 99%
• 1 ms
10s 99% packet loss + Good bandwidth
• 1 kbps
• 0%
• 1 ms
• 1 kbps
• 0%
• 1 ms
8s Very bad bandwidth + no packet loss
• max
• 100%
• 0 ms
• max
• 100%
• 0 ms
8s The same as the 100% Loss profile in the Network Link Conditioner

Steps to test using the macOS Network Link Conditioner

  1. Download and install the Additional Tools for Xcode #, where # corresponds to your Xcode version. https://developer.apple.com/download/all/?q=additional%20tools%20for%20Xcode
  2. Open the Network Link Conditioner in System Preferences.
  3. In the bottom-right corner, select Manage Profiles and click the + button to create a new profile using the values in the Test results section.
  4. Assume that you have implemented the connectionDidBecomePoor and connectionDidSuggestStopVideo methods, which log the connection is poor and suggest turning the video off messages, respectively. (Alternatively, you can use the Chime SDK Serverless Demo that outputs these messages to the browser console by default.)
  5. Open your browser console and filter the console messages by the "connection is poor" message. (If you have enabled your local video, filter the console by "suggest turning the video off.") Screen Shot 2023-06-15 at 3 59 27 PM
  6. Activate your custom profile in the Network Link Conditioner, and confirm that your application shows the "connection is poor" message in the browser console. image (1)
GeauxDrum commented 1 year ago

What we're really looking for here is a way to test that the connectionDidBecomePoor and/or connectionDidSuggestStopVideo events fire in an environment that does NOT kill the chime meeting as well. In all of the profiles listed above, having a poor connection event fire does nothing to prevent a meeting failure. It is imperative that we know the minimum thresholds for package loss etc. so that we can test our mitigation steps (lowering video quality, shutting off video etc) before a call fails. Otherwise, subscribing to these events has no value.

everscending commented 1 year ago

@simmkyu Can you or anyone else from the Chime team respond to this? Our team has actually been requesting this from AWS for months now and I simply can't explain why we have not been given the information we are asking for. We want repeatable test plan steps for triggering the poor call quality events(connectionDidSuggestStopVideo and connectionDidBecomePoor) without completely destroying the Chime meeting, so that we can validate the mitigations we've put in place to save the connection.

simmkyu commented 1 year ago

As per the current version 3.15.0, the Chime SDK for JavaScript triggers the connectionDidBecomePoor and connectionDidSuggestStopVideo callbacks when a meeting is on the verge of disconnecting. Specifically, a meeting remains active in the backend, but an attendee is on the brink of being disconnected from the meeting on the client-side.

Here's the process:

  1. The SDK monitors the packetsReceived metric from the browser's RTCPeerConnection.getStats API.
  2. The SDK triggers the callback by default if the number of packets received is less than 50% of the total expected packets (750).
  3. The SDK also uses the packetsReceived metric to attempt reconnection by re-establishing WebSocket and WebRTC.
  4. If this condition persists, an attendee will be dropped from the meeting. However, the attendee will be reconnected if the situation returns to normal. For example, if you turn off the macOS Network Link Conditioner during the test, the attendee will be reconnected to the meeting.

We want repeatable test plan steps for triggering the poor call quality events(connectionDidSuggestStopVideo and connectionDidBecomePoor) without completely destroying the Chime meeting, so that we can validate the mitigations we've put in place to save the connection.

According to the AWS Chime SDK Developer Guide, a Chime SDK meeting automatically ends when there are no attendees connected for five continuous minutes.

If you're referring to a meeting completely destroyed from the backend, you may keep another attendee active in a meeting.

everscending commented 1 year ago

@simmkyu What I meant by "completely destroys the meeting" is that it disconnects the attendee's browser from the meeting.

Specifically, a meeting remains active in the backend, but an attendee is on the brink of being disconnected from the meeting on the client-side.

How can a Network Link Conditioner profile be configured to simulate this moment right here ^^ ? Just enough degradation to trigger connectionDidBecomePoor and/or connectionDidSuggestStopVideo but not enough to kill the connection?

simmkyu commented 1 year ago

With the default timeout configurations, an attendee will eventually be dropped from a Chime SDK meeting; therefore, there isn't a specific Network Link Conditioner profile to simulate this scenario.

In your testing environment, could you adjust the MeetingSessionConfiguration.reconnectTimeoutMs value to the maximum duration of your test?

const configuration = new MeetingSessionConfiguration(meetingResponse, attendeeResponse);

// The default value is set to 2 minutes (120 * 1000).
configuration.reconnectTimeoutMs = 600 * 1000;

By default, the Chime SDK for JavaScript attempts to reestablish the client-side connection in just over two minutes, incorporating variable backoff delays. The attendee will be disconnected if the issue persists.

I need to talk to the Chime SDK backend team to confirm the duration allowed for service reconnection, but you should be able to write a test for this scenario.

  1. Join a Chime SDK meeting with reconnectTimeoutMs set to a 5-minute duration.
  2. Enable the 100% Loss profile setting in Network Link Conditioner.
  3. Confirm that your application has received either connectionDidBecomePoor or connectionDidSuggestStopVideo.
  4. Disable the 100% Loss profile setting in Network Link Conditioner.
  5. Confirm that your application has received connectionDidBecomeGood.
hensmi-amazon commented 1 year ago

There is no issue with delaying the reconnection. Every connection is 'fresh' from the perspective of the backend, and previous connections will be cleaned up on reconnection or 10 or so seconds of inactivity.

yochum commented 1 year ago

Closing this issue. Please follow up with AWS support if there are any further questions.