alibaba / nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
https://nacos.io
Apache License 2.0
30.12k stars 12.81k forks source link

Supports connection pool reuse in http2 proxy (nginx/envoy) #12329

Open godhth opened 2 months ago

godhth commented 2 months ago

Is your feature request related to a problem? Please describe. 当使用http2(nginx/envoy)代理时,由于网关使用了连接池管理复用连接,而nacos仅支持单个client 使用单个连接,导致重复的连接无法建立双向流。

When using the http2 (nginx/envoy) proxy, because the gateway uses a connection pool to manage multiplexed connections, nacos only supports a single client to use a single connection, resulting in repeated connections that cannot establish a bidirectional stream.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

KomachiSion commented 2 months ago

Nacos的连接是有意义的, 被代理复用后可能会导致数据串掉, 这个是怎么解决的?

KomachiSion commented 2 months ago

这个改动很大,需要完整的设计一下。

godhth commented 2 months ago

数据是通过不同stream来区分的,复用一个连接,但是streamId不同,每个streamObserver不同,数据不会串。 现在的问题是复用的连接,nacos认为是重复的,没有保存到ConnectionManager,因为当前的connectionId只能代表实际的物理连接,复用这种情况其实是逻辑连接。 所以我一开始想的是给connectionId 加上双向流的streamId的属性 来达到逻辑连接的效果,但是单向流获取不到双向流的streamId,单向流里也需要使用connectionId ,所有放弃了这个想法,进而使用给客户端增加一个标识来达到类似的目的,

实现了逻辑连接的效果,那么复用的连接也会被正确register 到ConnectionManager中,每个逻辑连接注册的streamObserver是不同的,所以数据不会串。

godhth commented 2 months ago

0880c4c8071081a9fb23

KomachiSion commented 2 months ago

@shiyiyue1102 help to review this design.

KomachiSion commented 2 months ago

This feature is a big changes for nacos connection between server and client.

It should be an experimental feature, If want to do this, Please make sure following:

  1. do code abstract, make your add feature code is independent and make sure remove them will not cause current connection way problem.
  2. add switch for this feature and default false.
fnless commented 1 month ago

We have a pod with 2 containers, one is istio-proxy(envoy) and the other is our app. The app container will listen two ports, 8080 for http, 9090 for grpc. Each server creates a nacos client to register its self service. Here is the problem, the both nacos clients occasionally reuse the grpc connection, and get the same connectionID. That would course that only one port would be registered successfully. When I set envoy concurrency=1, it 100% reproduced.

So I tried to use the following method to workaround:

If you guys have any better solutions, please tell me. Thanks.


How I watch the grpc connections:

kubectl exec -ti -n {app-ns} {pod-name} -c istio-proxy bash
# then
watch -n 1 "ss -pe | grep 9848 | grep envoy"

logs:

2024-08-12T01:54:07.535Z    ERROR   cache/disk_cache.go:75  read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory
2024-08-12T01:54:07.536Z    INFO    naming_http/push_receiver.go:89 udp server start, port: 55430
2024-08-12T01:54:07.536Z    DEBUG   rpc/rpc_client.go:290   78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ConnectResetRequest handler:ConnectResetRequestHandler
2024-08-12T01:54:07.536Z    DEBUG   rpc/rpc_client.go:290   78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler
2024-08-12T01:54:07.536Z    INFO    rpc/rpc_client.go:224   [RpcClient.Start] 78d1bd40-783b-4ef1-a26a-f71deb410578 try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848}
2024-08-12T01:54:07.538Z    INFO    util/common.go:96   Local IP:10.1.2.2
2024-08-12T01:54:07.660Z    INFO    rpc/rpc_client.go:234   78d1bd40-783b-4ef1-a26a-f71deb410578 success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.660Z    DEBUG   rpc/rpc_client.go:290   78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler
2024-08-12T01:54:07.660Z    DEBUG   rpc/rpc_client.go:298   78d1bd40-783b-4ef1-a26a-f71deb410578 register connection listener [*naming_grpc.ConnectionEventListener] to current client
2024-08-12T01:54:07.660Z    ERROR   cache/disk_cache.go:75  read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory
2024-08-12T01:54:07.660Z    INFO    naming_http/push_receiver.go:89 udp server start, port: 55325
2024-08-12T01:54:07.660Z    DEBUG   rpc/rpc_client.go:290   864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ConnectResetRequest handler:ConnectResetRequestHandler
2024-08-12T01:54:07.660Z    DEBUG   rpc/rpc_client.go:290   864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler
2024-08-12T01:54:07.660Z    INFO    rpc/rpc_client.go:224   [RpcClient.Start] 864f1d43-30da-42a9-a94a-14ce1edeb12d try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848}
2024-08-12T01:54:07.660Z    INFO    rpc/rpc_client.go:382   78d1bd40-783b-4ef1-a26a-f71deb410578 notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.766Z    INFO    rpc/rpc_client.go:234   864f1d43-30da-42a9-a94a-14ce1edeb12d success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.766Z    DEBUG   rpc/rpc_client.go:290   864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler
2024-08-12T01:54:07.766Z    DEBUG   rpc/rpc_client.go:298   864f1d43-30da-42a9-a94a-14ce1edeb12d register connection listener [*naming_grpc.ConnectionEventListener] to current client
2024-08-12T01:54:07.767Z    INFO    naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9898,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
2024-08-12T01:54:07.767Z    INFO    rpc/rpc_client.go:382   864f1d43-30da-42a9-a94a-14ce1edeb12d notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.780Z    INFO    naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9999,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"grpc"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
2024-08-12T01:54:07.789Z    INFO    naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":8080,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http2"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
fnless commented 1 month ago

We have a pod with 2 containers, one is istio-proxy(envoy) and the other is our app. The app container will listen two ports, 8080 for http, 9090 for grpc. Each server creates a nacos client to register its self service. Here is the problem, the both nacos clients occasionally reuse the grpc connection, and get the same connectionID. That would course that only one port would be registered successfully. When I set envoy concurrency=1, it 100% reproduced.

So I tried to use the following method to workaround:

  • method1: use BatchRegisterInstance to register http and grpc services.
  • method2: use traffic.sidecar.istio.io/excludeOutboundPorts to skip envoy.

If you guys have any better solutions, please tell me. Thanks.

How I watch the grpc connections:

kubectl exec -ti -n {app-ns} {pod-name} -c istio-proxy bash
# then
watch -n 1 "ss -pe | grep 9848 | grep envoy"

logs:

2024-08-12T01:54:07.535Z  ERROR   cache/disk_cache.go:75  read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory
2024-08-12T01:54:07.536Z  INFO    naming_http/push_receiver.go:89 udp server start, port: 55430
2024-08-12T01:54:07.536Z  DEBUG   rpc/rpc_client.go:290   78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ConnectResetRequest handler:ConnectResetRequestHandler
2024-08-12T01:54:07.536Z  DEBUG   rpc/rpc_client.go:290   78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler
2024-08-12T01:54:07.536Z  INFO    rpc/rpc_client.go:224   [RpcClient.Start] 78d1bd40-783b-4ef1-a26a-f71deb410578 try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848}
2024-08-12T01:54:07.538Z  INFO    util/common.go:96   Local IP:10.1.2.2
2024-08-12T01:54:07.660Z  INFO    rpc/rpc_client.go:234   78d1bd40-783b-4ef1-a26a-f71deb410578 success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.660Z  DEBUG   rpc/rpc_client.go:290   78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler
2024-08-12T01:54:07.660Z  DEBUG   rpc/rpc_client.go:298   78d1bd40-783b-4ef1-a26a-f71deb410578 register connection listener [*naming_grpc.ConnectionEventListener] to current client
2024-08-12T01:54:07.660Z  ERROR   cache/disk_cache.go:75  read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory
2024-08-12T01:54:07.660Z  INFO    naming_http/push_receiver.go:89 udp server start, port: 55325
2024-08-12T01:54:07.660Z  DEBUG   rpc/rpc_client.go:290   864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ConnectResetRequest handler:ConnectResetRequestHandler
2024-08-12T01:54:07.660Z  DEBUG   rpc/rpc_client.go:290   864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler
2024-08-12T01:54:07.660Z  INFO    rpc/rpc_client.go:224   [RpcClient.Start] 864f1d43-30da-42a9-a94a-14ce1edeb12d try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848}
2024-08-12T01:54:07.660Z  INFO    rpc/rpc_client.go:382   78d1bd40-783b-4ef1-a26a-f71deb410578 notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.766Z  INFO    rpc/rpc_client.go:234   864f1d43-30da-42a9-a94a-14ce1edeb12d success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.766Z  DEBUG   rpc/rpc_client.go:290   864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler
2024-08-12T01:54:07.766Z  DEBUG   rpc/rpc_client.go:298   864f1d43-30da-42a9-a94a-14ce1edeb12d register connection listener [*naming_grpc.ConnectionEventListener] to current client
2024-08-12T01:54:07.767Z  INFO    naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9898,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
2024-08-12T01:54:07.767Z  INFO    rpc/rpc_client.go:382   864f1d43-30da-42a9-a94a-14ce1edeb12d notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.780Z  INFO    naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9999,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"grpc"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
2024-08-12T01:54:07.789Z  INFO    naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":8080,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http2"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>

ping @KomachiSion