envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.95k stars 4.8k forks source link

Refactoring ClusterManagerImpl to support dynamic config for composite filter when its in upstream filter chain #33218

Closed yanjunxiang-google closed 6 months ago

yanjunxiang-google commented 7 months ago

Title: Refactoring ClusterManagerImpl c'tor and to support dynamic config for composite filter when its in upstream filter chain

This issue is related to the PR: https://github.com/envoyproxy/envoy/pull/33013

Description: Currently, the cluster manager is initialized with the below sequence:

1) https://github.com/envoyproxy/envoy/blob/8ff28fe73fe0c704b416c790fe7b22622cd5d5fc/source/server/configuration_impl.cc#L131 2)

https://github.com/envoyproxy/envoy/blob/8ff28fe73fe0c704b416c790fe7b22622cd5d5fc/source/common/upstream/cluster_manager_impl.cc#L2227

i.e, it calls the cluster_manager_impl->init(bootstrap) before return unique_ptr, and assign to the clustermanager.

This is causing problems when functions called inside cluster_manager_impl->init(bootstrap) needs to access clustermanger, which is the case for composite filter in the upstream filter chain.

So, for composite filter case, if it is put in the upstream filter chain, when the dynamic_config is configured: https://github.com/envoyproxy/envoy/blob/8ff28fe73fe0c704b416c790fe7b22622cd5d5fc/api/envoy/extensions/filters/http/composite/v3/composite.proto#L59

the above cluster_manager_impl->init(bootstrap) will eventually call this function: https://github.com/envoyproxy/envoy/blob/8ff28fe73fe0c704b416c790fe7b22622cd5d5fc/source/extensions/filters/http/composite/action.cc#L43 which access above clustermanger through server_factory_context.clusterManager(). As clustermanager is nullptr, this crashed.

So, if we want to support composite filter with dynamic_config in the upstream filter chain, one of the solution is moving the cluster_manager_impl->init(bootstrap) out from ProdClusterManagerFactory::clusterManagerFromProto

And call it after, like this:

clustermanager = clusterManagerFromProto(); clustermanager->init(bootstrap);

This way, when the above init() function access clusterManager(), clustermanager is already setup and no issues. We need to add good comments to make sure no other code wrote in between these two lines though.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

soulxu commented 7 months ago

cc @wbpcode @RyanTheOptimist

It will lead to a crash, so it should be a bug I think

wbpcode commented 7 months ago

This is duplicate with https://github.com/envoyproxy/envoy/issues/26653

I am glad to review/see a fix to close these two issue. :)

yanjunxiang-google commented 7 months ago

Thanks for sharing the info! I have a draft PR for this: https://github.com/envoyproxy/envoy/pull/33221

yanjunxiang-google commented 7 months ago

@soulxu As we currently does not support composite filter in upstream filter chain, there is no crash at this moment.

yanjunxiang-google commented 7 months ago

BTW, we need to make same change for nighthawk here: https://github.com/envoyproxy/nighthawk/blob/ac98827203dafabf447ff68e1403e0c2834ee726/source/client/process_impl.cc#L897

qqustc commented 6 months ago

nighthawk CI crashed due to https://github.com/envoyproxy/envoy/pull/33221 https://dev.azure.com/cncf/4684fb3d-0389-4e0b-8251-221942316e06/_apis/build/builds/168085/logs/28

fixed in https://github.com/envoyproxy/nighthawk/pull/1133