envoyproxy / envoy-mobile

Client HTTP and networking library based on the Envoy project for iOS, Android, and more.
https://envoymobile.io
Apache License 2.0
557 stars 84 forks source link

xDS integration tests hang for 15s before receiving DiscoveryResponse #2678

Closed abeyad closed 1 year ago

abeyad commented 1 year ago

This happens when running //test/common/integration:rtds_integration_test and //test/common/integration:sds_integration_test.

The cause seems to be the initial config fetch timer expiring after 15s: https://github.com/envoyproxy/envoy/blob/3059a76977311b43dc993720f7fda49c8e4dfbee/source/common/config/grpc_subscription_impl.cc#L35. This seems to unblock the FakeUpstream from sending a xDS response, allowing the gRPC stream to proceed and the response is received in the Envoy engine.

abeyad commented 1 year ago

The reason for waiting 15s and then getting a gRPC initial fetch timeout is because the default initial_fetch_timeout for the config source is 15s (unless one explicitly sets it). Until the initial fetch completes (either by success, failure, or timeout), the PostInit function will not get called, and if the PostInit function doesn't get called, the Envoy Mobile engine will wait without receiving a notification on the mutex. This is what causes the test to hang for 15s before proceeding.

The integration tests will not receive an initial fetch for xDS because it has to wait for it to get sent by the FakeUpstream via a sendDiscoveryResponse call. The solution for integration tests is to set the initial_fetch_timeout, like what we did in https://github.com/envoyproxy/envoy-mobile/pull/2679.

The solution for Envoy Mobile in general is to make sure the builder APIs for xDS add an initial_fetch_timeout to the configured xDS source and ensure it's a reasonable value (e.g. 5s instead of 15s).