Open godhth opened 2 months ago
Nacos的连接是有意义的, 被代理复用后可能会导致数据串掉, 这个是怎么解决的?
这个改动很大,需要完整的设计一下。
数据是通过不同stream来区分的,复用一个连接,但是streamId不同,每个streamObserver不同,数据不会串。 现在的问题是复用的连接,nacos认为是重复的,没有保存到ConnectionManager,因为当前的connectionId只能代表实际的物理连接,复用这种情况其实是逻辑连接。 所以我一开始想的是给connectionId 加上双向流的streamId的属性 来达到逻辑连接的效果,但是单向流获取不到双向流的streamId,单向流里也需要使用connectionId ,所有放弃了这个想法,进而使用给客户端增加一个标识来达到类似的目的,
实现了逻辑连接的效果,那么复用的连接也会被正确register 到ConnectionManager中,每个逻辑连接注册的streamObserver是不同的,所以数据不会串。
@shiyiyue1102 help to review this design.
This feature is a big changes for nacos connection between server and client.
It should be an experimental feature, If want to do this, Please make sure following:
We have a pod with 2 containers, one is istio-proxy
(envoy) and the other is our app
. The app container will listen two ports, 8080 for http, 9090 for grpc. Each server creates a nacos client to register its self service. Here is the problem, the both nacos clients occasionally reuse the grpc connection, and get the same connectionID. That would course that only one port would be registered successfully. When I set envoy concurrency=1
, it 100% reproduced.
So I tried to use the following method to workaround:
BatchRegisterInstance
to register http and grpc services.traffic.sidecar.istio.io/excludeOutboundPorts
to skip envoy.If you guys have any better solutions, please tell me. Thanks.
How I watch the grpc connections:
kubectl exec -ti -n {app-ns} {pod-name} -c istio-proxy bash
# then
watch -n 1 "ss -pe | grep 9848 | grep envoy"
logs:
2024-08-12T01:54:07.535Z ERROR cache/disk_cache.go:75 read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory
2024-08-12T01:54:07.536Z INFO naming_http/push_receiver.go:89 udp server start, port: 55430
2024-08-12T01:54:07.536Z DEBUG rpc/rpc_client.go:290 78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ConnectResetRequest handler:ConnectResetRequestHandler
2024-08-12T01:54:07.536Z DEBUG rpc/rpc_client.go:290 78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler
2024-08-12T01:54:07.536Z INFO rpc/rpc_client.go:224 [RpcClient.Start] 78d1bd40-783b-4ef1-a26a-f71deb410578 try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848}
2024-08-12T01:54:07.538Z INFO util/common.go:96 Local IP:10.1.2.2
2024-08-12T01:54:07.660Z INFO rpc/rpc_client.go:234 78d1bd40-783b-4ef1-a26a-f71deb410578 success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:290 78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler
2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:298 78d1bd40-783b-4ef1-a26a-f71deb410578 register connection listener [*naming_grpc.ConnectionEventListener] to current client
2024-08-12T01:54:07.660Z ERROR cache/disk_cache.go:75 read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory
2024-08-12T01:54:07.660Z INFO naming_http/push_receiver.go:89 udp server start, port: 55325
2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:290 864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ConnectResetRequest handler:ConnectResetRequestHandler
2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:290 864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler
2024-08-12T01:54:07.660Z INFO rpc/rpc_client.go:224 [RpcClient.Start] 864f1d43-30da-42a9-a94a-14ce1edeb12d try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848}
2024-08-12T01:54:07.660Z INFO rpc/rpc_client.go:382 78d1bd40-783b-4ef1-a26a-f71deb410578 notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.766Z INFO rpc/rpc_client.go:234 864f1d43-30da-42a9-a94a-14ce1edeb12d success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.766Z DEBUG rpc/rpc_client.go:290 864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler
2024-08-12T01:54:07.766Z DEBUG rpc/rpc_client.go:298 864f1d43-30da-42a9-a94a-14ce1edeb12d register connection listener [*naming_grpc.ConnectionEventListener] to current client
2024-08-12T01:54:07.767Z INFO naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9898,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
2024-08-12T01:54:07.767Z INFO rpc/rpc_client.go:382 864f1d43-30da-42a9-a94a-14ce1edeb12d notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020
2024-08-12T01:54:07.780Z INFO naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9999,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"grpc"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
2024-08-12T01:54:07.789Z INFO naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":8080,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http2"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
We have a pod with 2 containers, one is
istio-proxy
(envoy) and the other is ourapp
. The app container will listen two ports, 8080 for http, 9090 for grpc. Each server creates a nacos client to register its self service. Here is the problem, the both nacos clients occasionally reuse the grpc connection, and get the same connectionID. That would course that only one port would be registered successfully. When I set envoyconcurrency=1
, it 100% reproduced.So I tried to use the following method to workaround:
- method1: use
BatchRegisterInstance
to register http and grpc services.- method2: use
traffic.sidecar.istio.io/excludeOutboundPorts
to skip envoy.If you guys have any better solutions, please tell me. Thanks.
How I watch the grpc connections:
kubectl exec -ti -n {app-ns} {pod-name} -c istio-proxy bash # then watch -n 1 "ss -pe | grep 9848 | grep envoy"
logs:
2024-08-12T01:54:07.535Z ERROR cache/disk_cache.go:75 read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory 2024-08-12T01:54:07.536Z INFO naming_http/push_receiver.go:89 udp server start, port: 55430 2024-08-12T01:54:07.536Z DEBUG rpc/rpc_client.go:290 78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ConnectResetRequest handler:ConnectResetRequestHandler 2024-08-12T01:54:07.536Z DEBUG rpc/rpc_client.go:290 78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler 2024-08-12T01:54:07.536Z INFO rpc/rpc_client.go:224 [RpcClient.Start] 78d1bd40-783b-4ef1-a26a-f71deb410578 try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} 2024-08-12T01:54:07.538Z INFO util/common.go:96 Local IP:10.1.2.2 2024-08-12T01:54:07.660Z INFO rpc/rpc_client.go:234 78d1bd40-783b-4ef1-a26a-f71deb410578 success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020 2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:290 78d1bd40-783b-4ef1-a26a-f71deb410578 register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler 2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:298 78d1bd40-783b-4ef1-a26a-f71deb410578 register connection listener [*naming_grpc.ConnectionEventListener] to current client 2024-08-12T01:54:07.660Z ERROR cache/disk_cache.go:75 read cacheDir:/tmp/nacos/cache/naming/public failed!err:open /tmp/nacos/cache/naming/public: no such file or directory 2024-08-12T01:54:07.660Z INFO naming_http/push_receiver.go:89 udp server start, port: 55325 2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:290 864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ConnectResetRequest handler:ConnectResetRequestHandler 2024-08-12T01:54:07.660Z DEBUG rpc/rpc_client.go:290 864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:ClientDetectionRequest handler:ClientDetectionRequestHandler 2024-08-12T01:54:07.660Z INFO rpc/rpc_client.go:224 [RpcClient.Start] 864f1d43-30da-42a9-a94a-14ce1edeb12d try to connect to server on start up, server: {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} 2024-08-12T01:54:07.660Z INFO rpc/rpc_client.go:382 78d1bd40-783b-4ef1-a26a-f71deb410578 notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020 2024-08-12T01:54:07.766Z INFO rpc/rpc_client.go:234 864f1d43-30da-42a9-a94a-14ce1edeb12d success to connect to server {serverIp:nacos-hs.nacos serverPort:8848 serverGrpcPort:9848} on start up, connectionId=1723427647554_10.1.2.2_43020 2024-08-12T01:54:07.766Z DEBUG rpc/rpc_client.go:290 864f1d43-30da-42a9-a94a-14ce1edeb12d register server push request:NotifySubscriberRequest handler:NamingPushRequestHandler 2024-08-12T01:54:07.766Z DEBUG rpc/rpc_client.go:298 864f1d43-30da-42a9-a94a-14ce1edeb12d register connection listener [*naming_grpc.ConnectionEventListener] to current client 2024-08-12T01:54:07.767Z INFO naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9898,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}> 2024-08-12T01:54:07.767Z INFO rpc/rpc_client.go:382 864f1d43-30da-42a9-a94a-14ce1edeb12d notify connected event to listeners , connectionId=1723427647554_10.1.2.2_43020 2024-08-12T01:54:07.780Z INFO naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":9999,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"grpc"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}> 2024-08-12T01:54:07.789Z INFO naming_grpc/naming_grpc_proxy.go:95 register instance namespaceId:<public>,serviceName:<test> with instance:<{"instanceId":"","ip":"10.1.2.2","port":8080,"weight":100,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"","serviceName":"","metadata":{"kind":"http2"},"instanceHeartBeatInterval":0,"ipDeleteTimeout":0,"instanceHeartBeatTimeOut":0}>
ping @KomachiSion
Is your feature request related to a problem? Please describe. 当使用http2(nginx/envoy)代理时,由于网关使用了连接池管理复用连接,而nacos仅支持单个client 使用单个连接,导致重复的连接无法建立双向流。
When using the http2 (nginx/envoy) proxy, because the gateway uses a connection pool to manage multiplexed connections, nacos only supports a single client to use a single connection, resulting in repeated connections that cannot establish a bidirectional stream.
Describe the solution you'd like A clear and concise description of what you want to happen.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.