Closed majjfootman closed 4 years ago
Perhaps you can check your nacos properties of live times. It seems as if the old version provider url is still in nacos.
After check my nacos client and server properties,I am using the default properties configuration, and the client as blow:
dubbo:
scan:
base-packages: xxxxxxxx
consumer:
check: false
timeout: 30000
protocol:
name: dubbo
port: 20881
registry:
address: nacos://xxxxxxx:8848
cloud:
subscribed-services: xxxxxxxxx
spring:
application:
name: xxxxxxxx
profiles:
active: dev
cloud:
nacos:
config:
server-addr: xxxxxxxx:8848
discovery:
server-addr: xxxxxxx:8848
When the reconnect happens, I had tried to restart the nacos server, but the reconnect stills can not stop. So my dubbo consumer continues trying to reconnect the old provider ip for several hours, and the exception logs is continuing output. Is it normally
What I said nacos properties
is in you nacos server.
Can you reproduce this issue and check nacos on its web management (http://xxxxxx:8848/nacos/) ?
You can notice the Healthy Instance Count
in ServiceManagement - Service List
.
I had checked the nacos web pannel, the old provider was not show in the service list any more, and the Healthy Instance Count is '1' which is the new provider
What about the Subscribers
and Listening Query
sections are correct or not?
Just now I try to reproduce this issue in my computer, I found the module A will still try to reconnect the old version B before this new version's url notified by nacos and old version healthy checkout timeout. And this may takes about two minutes before stopping reconnecting to the old version provider. After that everything returns to normal. @majjfootman Did module A still try to reconnect old provider B after serveral minutes ?
Moudule A still try to reconnect old provider B after serveral minutes, even after serveral hours. May be my config is not correct, or the nacos server version is not compatibility? My nacos server version is 1.1.3, and I am using the spring-cloud-alibaba 2.2.0.RELEASE. Can you share your module's version or configuration, Thank you very much
I am just dirctly use dubbo without spring-cloud-alibaba.
<!-- dubbo -->
<dependency>
<groupId>org.apache.dubbo</groupId>
<artifactId>dubbo-spring-boot-starter</artifactId>
<version>${dubbo-version}</version>
</dependency>
<dependency>
<groupId>org.apache.dubbo</groupId>
<artifactId>dubbo-registry-nacos</artifactId>
<version>${dubbo-version}</version>
<exclusions>
<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>dubbo-common</artifactId>
<groupId>org.apache.dubbo</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.alibaba.nacos</groupId>
<artifactId>nacos-client</artifactId>
<version>${nacos-version}</version>
</dependency>
properties is below
dubbo:
scan:
base-packages: com.xxxxxx
protocol:
name: dubbo
port: -1
registry:
address: nacos://127.0.0.1:8848
application:
name: xxxxxx
consumer:
check: false
Just now I notice your error message, the main error is client-side timeout
. Perhaps this is the main reason because in general if a server shutdown the client reconnect it will come out Connection refused
owing to closed port. But in k8s if a pod is totally closed, the ip cannot be connect no longer and the result may changed while the request is loopback in the k8s network or even turned to outside network. This maybe the issue of doClient
function in NettyClient
.
Thank you for your help. I run my application at my local environment and reproduce, the connection exception type changes but still continue reconnect. Then I use your dirctly maven dependency instand of spring cloud alibaba, it was not reconnecting any more. I will try this in k8s later. May be the continues reconnection is the issue of spring cloud alibaba.
的确是重启provider会这样,我在consumer上调用几次,发现就在不停的连接,注册中心返回的数据查了下,没有原来的ip数据。我的docker用的是swarm网络,容器重启后ip会变,计算机名不变。
Environment
Steps to reproduce this issue
Pls. provide [GitHub address] to reproduce this issue.
What do you expected from the above steps?
stop reconnect to the old provider
What actually happens?
Actual Result is cannot stop reconnect to the old provider
If there is an exception, please attach the exception trace: