alibaba / spring-cloud-alibaba

Spring Cloud Alibaba provides a one-stop solution for application development for the distributed solutions of Alibaba middleware.
https://sca.aliyun.com
Apache License 2.0
27.96k stars 8.34k forks source link

springCloudAlibaba+dubbo+nacos环境下,重启服务提供者或先启动服务消费者后启动服务提供者的情况下,消费者有时候会出现找不到服务的问题及解决方案(In the spring cloud Alibaba + Dubbo + Nacos environment, when the service provider is restarted or the service consumer is started first and then the service provider, sometimes the consumer can not find the service when calling) #1805

Closed tianzeyong closed 3 years ago

tianzeyong commented 4 years ago

1.问题的直接表现(The direct manifestation of the problem):

org.apache.dubbo.rpc.RpcException: No provider available from registry localhost:9090 for service com.hxy.boot.ticket.articles.api.ArticleService on consumer 192.168.137.1 use dubbo version 2.7.8, please check status of providers(disabled, not registered or in blacklist). at org.apache.dubbo.registry.integration.RegistryDirectory.doList(RegistryDirectory.java:599) at org.apache.dubbo.rpc.cluster.directory.AbstractDirectory.list(AbstractDirectory.java:74) at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.list(AbstractClusterInvoker.java:292) at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:257) at org.apache.dubbo.rpc.cluster.interceptor.ClusterInterceptor.intercept(ClusterInterceptor.java:47) at org.apache.dubbo.rpc.cluster.support.wrapper.AbstractCluster$InterceptorInvokerNode.invoke(AbstractCluster.java:92) at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:88) at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:74)

2.问题的直接原因(The direct cause of the problem):

调用服务提供者时,消费者的dubbo的服务目录 org.apache.dubbo.registry.integration.RegistryDirectoryforbidden 属性 为 true,如下图: When the service provider is called, the forbidden property of org.apache.dubbo.registry.integration.RegistryDirectory is true in consumer side. as shown in the following figure:

直接原因1

3.问题的重现(Recurrence of the problem):

这个问题是偶尔出现的,不容易捕捉。经过分析,在服务提供者的 org.apache.dubbo.config.spring.context.DubboBootstrapApplicationListener#onContextRefreshedEvent(ContextRefreshedEvent event) 的 31行打上断点,并且suspend模式设为 Thread,然后重启服务提供者,就会一直重现此问题。如下图: This problem occurs occasionally and is not easy to catch. After analysis, if a breakpoint is made on line 31 of the service provider's class org.apache.dubbo.config.spring.context.DubboBootstrapApplicationListener#onContextRefreshedEvent(ContextRefreshedEvent event), and the debug suspend mode is set to thread, and then the service provider is restarted, this problem will always recur. as shown in the following figure:

问题的重现

4.问题的根本原因(The root cause of the problem):

问题的根本原因是spring cloud alibaba框架启动nacos自动服务注册的时点比启动dubbo服务注册的时点早。前者的启动时点是监听到WebServerInitializedEvent事件时(org.springframework.cloud.client.serviceregistry.AbstractAutoServiceRegistration#bind(WebServerInitializedEvent event)),后者的启动时点是监听到ContextRefreshedEvent事件时(org.apache.dubbo.config.spring.context.DubboBootstrapApplicationListener#onContextRefreshedEvent(ContextRefreshedEvent event))。

The root cause of the problem is that the 'spring cloud Alibaba' framework starts Nacos automatic service registration earlier than Dubbo service registration. The starting time of the former is when the 'webserver initialized event' event (org.springframework.cloud.client.serviceregistry.AbstractAutoServiceRegistration#bind(WebServerInitializedEvent event)) is heard, while the latter is when the 'contextrefreshedevent' event (org.apache.dubbo.config.spring.context.DubboBootstrapApplicationListener#onContextRefreshedEvent(ContextRefreshedEvent event)) is monitored.

spring boot 2.2.xServletWebServerInitializedEvent事件的发布是在ContextRefreshedEvent事件之后,如图: In 'spring boot 2.2. X', the 'servlet webserver initialized event' event is published after the 'contextrefreshedevent' event, as shown in the following figure:

springboot2 2 x

但在 spring boot 2.3.x 中改在了ContextRefreshedEvent事件前,如图: However, in 'spring boot 2.3. X', it is changed before the 'contextrefreshedevent' event, as shown in the following figure:

springboot2 3 x

nacos服务端在处理了服务提供者的注册请求后向订阅者下发了实例变更通知,而在这个过程中提供者自身的dubbo服务暴露有可能还没有完成,最直接的表现就是服务提供者的 com.alibaba.cloud.dubbo.metadata.repository.DubboServiceMetadataRepositoryallExportedURLs属性中还没有对应的dubbo服务的URL。
After processing the registration request of the service provider, the Nacos server sends an instance change notice to the subscriber. In this process, the provider's own Dubbo service exposure may not be completed, and the most direct performance is that: the allexportedurls property of class com.alibaba.cloud.dubbo.metadata.repository.Dubboservicemetadatarepository has no URL for the corresponding Dubbo service.

在第3条的问题重现里面,当程序跑到断点的时候,通过jprofiler查看此时的堆栈信息,可以看到allExportedURLs属性中没有期望的值。 In the problem recurrence in Item 3, when the program runs to a breakpoint, check the stack information through the 'jpprofiler'. You can see that there is no expected value in the 'allexportedurls' attribute.

因为spring cloud alibaba + dubbo 中dubbo的服务是暴露在本地的com.alibaba.cloud.dubbo.metadata.repository.DubboServiceMetadataRepository中的 allExportedURLs 属性中,不会传到注册中心服务端。所以最终暴露完成以后,nacos服务端无法感知到dubbo服务是否已准备妥当,也无法通知订阅者。这种情况下,提供者发起调用时通过泛化调用DubboMetadataService接口获取提供者暴露的服务时,从 allExportedURLs 中获取到的就是一个空的 List<Url>。然后消费者就会以为是没有提供者,于是在自己本地的dubbo服务目录 RegistryDirectory 中 把禁用属性 forbidden 的值更新为了 true。 Because Dubbo's services in spring cloud Alibaba + Dubbo are exposed locally in allexportedurls property of the class com.alibaba.cloud . dubbo.metadata.repository.Dubboservicemetadatarepository. will not be transferred to the registry server.Therefore, after the final exposure is completed, the Nacos server cannot perceive whether the Dubbo service is ready or not, and cannot notify the subscriber.In this case, when a provider initiates a call to obtain the services exposed by the provider through a generalized call to the dubbometadataservice interface, an empty list < URL > is obtained from allexportedurls.Then, the consumer will think that there is no provider, so they update the value of disabled attribute forbidden to true in their local Dubbo service directory registrydirectory.

这时消费者调用提供者时就出现了第1条中的问题。 At this time, the problem in Article 1 arises when the consumer calls the provider.

5.1 应用端解决方案(Application side solutions):

` @Component public class NacosServiceInstanceUpAndDownOperator implements ApplicationRunner, Closeable { protected Logger logger = LoggerFactory.getLogger(this.getClass());

/**
 * nacos服务实例上线
 */
private static final String OPERATOR_UP = "UP";
/**
 * nacos服务实例下线
 */
private static final String OPERATOR_DOWN = "DOWN";

@Autowired
NacosServiceRegistry nacosServiceRegistry;

@Autowired
NacosRegistration nacosRegistration;

private ScheduledExecutorService executorService;

@PostConstruct
public void init() {
    int poolSize = 1;
    this.executorService = new ScheduledThreadPoolExecutor(poolSize, new ThreadFactory() {
        @Override
        public Thread newThread(Runnable r) {
            Thread thread = new Thread(r);
            thread.setDaemon(true);
            thread.setName("NacosServiceInstanceUpAndDownOperator");
            return thread;
        }
    });
}

@Override
public void run(ApplicationArguments args) throws Exception {
    long delay_down = 5000L;  //下线任务延迟
    long delay_up = 10000L;   // 上线任务延迟
    this.executorService.schedule(new InstanceDownAndUpTask(nacosServiceRegistry, nacosRegistration, OPERATOR_DOWN), delay_down, TimeUnit.MILLISECONDS);
    this.executorService.schedule(new InstanceDownAndUpTask(nacosServiceRegistry, nacosRegistration, OPERATOR_UP), delay_up, TimeUnit.MILLISECONDS);
}

@Override
public void shutdown() throws NacosException {
    ThreadUtils.shutdownThreadPool(executorService, logger);
}

/**
 * 服务实例上下线任务
 */
class InstanceDownAndUpTask implements Runnable {
    private NacosServiceRegistry nacosServiceRegistry;
    private NacosRegistration nacosRegistration;
    //更新服务实例的状态 :UP 、DOWN
    private String nacosServiceInstanceOperator;

    InstanceDownAndUpTask(NacosServiceRegistry nacosServiceRegistry, NacosRegistration nacosRegistration, String nacosServiceInstanceOperator) {
        this.nacosServiceRegistry = nacosServiceRegistry;
        this.nacosRegistration = nacosRegistration;
        this.nacosServiceInstanceOperator = nacosServiceInstanceOperator;
    }

    @Override
    public void run() {
        logger.info("===更新nacos服务实例的状态to:{}===start=", nacosServiceInstanceOperator);
        this.nacosServiceRegistry.setStatus(nacosRegistration, nacosServiceInstanceOperator);
        logger.info("===更新nacos服务实例的状态to:{}===end=", nacosServiceInstanceOperator);

        //上线后,关闭线程池
        if (NacosServiceInstanceUpAndDownOperator.OPERATOR_UP.equals(nacosServiceInstanceOperator)) {
            ThreadUtils.shutdownThreadPool(NacosServiceInstanceUpAndDownOperator.this.executorService, NacosServiceInstanceUpAndDownOperator.this.logger);
        }
    }
}

} `

5.2 框架端解决方案的几点意见(Some suggestions on the solution of framework side):

wghdir commented 4 years ago

看来不少人碰到这个问题,感觉是不是解决方向上有问题,如果出现这种情况,重新从注册中心取一次数据更新会更合适。

strugglingbird commented 4 years ago

官方说解决了,其实根本没解决,我也发现这个问题,而且确实是有概率的,并不是毕现。

wghdir commented 4 years ago

可能是产生的原因很多~~,刚试出一次,然后怎么调都是这个提示,最后在nacos上把服务下线,再上线,就可以访问了。希望官方能不能有这个提示的时候从注册中心重新拉一下数据,明明服务是可用的。

zjwon commented 4 years ago

这个问题再k8s环境是必现的,感谢大佬的方案,准备按方案b试一下

liu2811751 commented 4 years ago

b方法 刚试了下 没有效果啊。 但是在nacos上把服务下线后,然后再上线 这样是有效果的

wghdir commented 4 years ago

b方法 刚试了下 没有效果啊。 但是在nacos上把服务下线后,然后再上线 这样是有效果的

代码执行了没有?我测试提供者启动后下线再上线好像是可以的。不过我是直接把代码加到main里了,没像上面这么用。

cqyisbug commented 4 years ago

我有一个问题,为什么在心跳处理时不把这些问题解决一下.

还有,在注册了 ApplicationEventMulticaster 这个bean之后dubbo服务就几乎不可能暴露出来.

cqyisbug commented 4 years ago

我有一个问题,为什么在心跳处理时不把这些问题解决一下.

还有,在注册了 ApplicationEventMulticaster 这个bean之后dubbo服务就几乎不可能暴露出来.

正因为有异步的存在,所以答主的第一个解决方案感觉不是很可行

zjwon commented 4 years ago

在k8s环境,方案b没有效果,可以正常消费,但是依然会去尝试连接老的IP

strugglingbird commented 4 years ago

@mercyblitz 小马哥这个问题是不是后续版本都不打算解决了哇

zjwon commented 4 years ago

k8s环境下,使用以下版本,只会打印一次错误日志,可以说已经解决了问题

关键日志如下

2020-11-13 17:14:28.369  INFO 6 --- [client.listener] o.a.d.remoting.transport.AbstractClient  :  [DUBBO] Successed connect to server /172.20.2.116:20880 from NettyClient 172.20.3.9 using dubbo version 2.7.8, channel is NettyChannel [channel=[id: 0x1c71191d, L:/172.20.3.9:38964 - R:/172.20.2.116:20880]], dubbo version: 2.7.8, current host: 172.20.3.9
2020-11-13 17:14:28.369  INFO 6 --- [lientWorker-1-1] o.a.d.r.t.netty4.NettyClientHandler      :  [DUBBO] The connection of /172.20.3.9:38964 -> /172.20.2.116:20880 is established., dubbo version: 2.7.8, current host: 172.20.3.9
2020-11-13 17:14:28.377  INFO 6 --- [client.listener] o.a.d.remoting.transport.AbstractClient  :  [DUBBO] Start NettyClient /172.20.3.9 connect to the server /172.20.2.116:20880, dubbo version: 2.7.8, current host: 172.20.3.9

......

2020-11-13 17:14:36.910  INFO 6 --- [client.listener] a.DubboServiceDiscoveryAutoConfiguration : The event of the service instances[name : com-sms-service , size : 2] change is about to be dispatched

......

2020-11-13 17:15:22.840  INFO 6 --- [lientWorker-1-2] o.a.d.r.t.netty4.NettyClientHandler      :  [DUBBO] The connection of /172.20.3.9:45860 -> /172.20.4.200:20880 is disconnected., dubbo version: 2.7.8, current host: 172.20.3.9

......

2020-11-13 17:15:40.420  INFO 6 --- [eCheck-thread-1] o.a.d.r.e.s.header.ReconnectTimerTask    :  [DUBBO] Initial connection to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.20.3.9:45860 -> /172.20.4.200:20880]], dubbo version: 2.7.8, current host: 172.20.3.9
2020-11-13 17:15:40.427  INFO 6 --- [eCheck-thread-1] o.a.d.r.transport.netty4.NettyChannel    :  [DUBBO] Close netty channel [id: 0x8bc3520d, L:/172.20.3.9:45860 ! R:/172.20.4.200:20880], dubbo version: 2.7.8, current host: 172.20.3.9
2020-11-13 17:15:40.444 ERROR 6 --- [eCheck-thread-1] o.a.d.r.e.s.header.ReconnectTimerTask    :  [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.20.3.9:45860 -> /172.20.4.200:20880]], dubbo version: 2.7.8, current host: 172.20.3.9

org.apache.dubbo.remoting.RemotingException: client(url: dubbo://172.20.4.200:20880/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=true&application=com-message-service&bind.ip=172.20.4.200&bind.port=20880&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=true&group=com-sms-service&heartbeat=60000&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&metadata-type=remote&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=6&qos.enable=false&register=true&register.ip=172.20.3.9&release=2.7.3&remote.application=com-sms-service&revision=2.1.1.RELEASE&side=consumer&sticky=false&timestamp=1604298942645&version=1.0.0) failed to connect to server /172.20.4.200:20880, error message is:No route to host: /172.20.4.200:20880
    at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:169)
    at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:191)
    at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:247)
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:166)
    at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49)
    at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87)
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:648)
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:727)
    at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:449)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: /172.20.4.200:20880
Caused by: java.net.NoRouteToHostException: No route to host
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

2020-11-13 17:15:46.939  INFO 6 --- [client.listener] a.DubboServiceDiscoveryAutoConfiguration : The event of the service instances[name : com-sms-service , size : 1] change is about to be dispatched

......

2020-11-13 17:15:56.942  INFO 6 --- [client.listener] a.DubboServiceDiscoveryAutoConfiguration : The event of the service instances[name : com-sms-service , size : 1] change is about to be dispatched

......

2020-11-13 17:15:58.993  INFO 6 --- [client.listener] o.a.d.r.transport.netty4.NettyChannel    :  [DUBBO] Close netty channel [id: 0x8bc3520d, L:/172.20.3.9:45860 ! R:/172.20.4.200:20880], dubbo version: 2.7.8, current host: 172.20.3.9
zhaoziji commented 3 years ago

我发现,如果在nacos的控制台页面去手动执行服务提供者的下线操作,服务消费者的 ReferenceCountExchangeClient#replaceWithLazyClient() 方法会被触发,之后只要消费者端不重启,服务提供者端任意重启都不会出现该情况。

关键点似乎在于“dubbo.metadata-service.urls”的url参数值如果能包含"lazy=true",该问题便不会出现(可尝试先启动服务提供者,在nacos控制台页面手动调整服务提供者的该值,然后再启动服务消费者,然后任意重启服务提供者)。

image
HuangDayu commented 3 years ago

emmm...而且还出现同一个服务,一部分dubbo接口能用,一部分dubbo接口不能用的情况.

pangshuqiang commented 3 years ago

已经困惑开发组人员很久的问题,当SpringCloud两个微服务相互成为提供者、消费者时,似乎无解!之前开发组把nacos://改成spring-cloud://连接前缀后得1、2天,之后该问题又再重现。希望早点有新版解决该问题。

HuangDayu commented 3 years ago

已经困惑开发组人员很久的问题,当SpringCloud两个微服务相互成为提供者、消费者时,似乎无解!之前开发组把nacos://改成spring-cloud://连接前缀后得1、2天,之后该问题又再重现。希望早点有新版解决该问题。

亲,我这边建议您弃坑。

pangshuqiang commented 3 years ago

已经困惑开发组人员很久的问题,当SpringCloud两个微服务相互成为提供者、消费者时,似乎无解!之前开发组把nacos://改成spring-cloud://连接前缀后得1、2天,之后该问题又再重现。希望早点有新版解决该问题。

亲,我这边建议您弃坑。

信仰要充值!

其实再提供者重启完成时,消费者是能接收到的,在消费者的控制台会有下面的内容输出: [15:35:18:083] [INFO] - com.alibaba.nacos.client.naming.core.PushReceiver.run(PushReceiver.java:86) - received push data: {"type":"dom","data":"{\"hosts\":[{\"ip\":\"192.168.100.2\",\"port\":9011,\"valid\":true,\"healthy\":true,\"marked\":false,\"instanceId\":\"192.168.100.2#9011#DEFAULT#DEFAULT_GROUP@@secret-server\",\"metadata\":{\"dubbo.metadata-service.urls\":\"[ \\\"dubbo://192.168.100.2:29011/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=false&application=secret-server&bind.ip=192.168.100.2&bind.port=29011&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&group=secret-server&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=19812&qos.enable=false&release=2.7.8&revision=2.2.3.RELEASE&side=provider&timestamp=1608190515329&version=1.0.0\\\" ]\",\"dubbo.protocols.dubbo.port\":\"29011\",\"preserved.register.source\":\"SPRING_CLOUD\"},\"enabled\":true,\"weight\":1.0,\"clusterName\":\"DEFAULT\",\"serviceName\":\"DEFAULT_GROUP@@secret-server\",\"ephemeral\":true}],\"dom\":\"DEFAULT_GROUP@@secret-server\",\"name\":\"DEFAULT_GROUP@@secret-server\",\"cacheMillis\":10000,\"lastRefTime\":1608190518452,\"checksum\":\"1971e7cb61623924e7407e8206da46e5\",\"useSpecifiedURL\":false,\"clusters\":\"\",\"env\":\"\",\"metadata\":{}}","lastRefTime":104528484199741} from /192.168.10.4 [15:35:18:083] [INFO] - com.alibaba.nacos.client.naming.core.HostReactor.processServiceJson(HostReactor.java:191) - new ips(1) service: DEFAULT_GROUP@@secret-server -> [{"instanceId":"192.168.100.2#9011#DEFAULT#DEFAULT_GROUP@@secret-server","ip":"192.168.100.2","port":9011,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@secret-server","metadata":{"dubbo.metadata-service.urls":"[ \"dubbo://192.168.100.2:29011/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=false&application=secret-server&bind.ip=192.168.100.2&bind.port=29011&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&group=secret-server&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=19812&qos.enable=false&release=2.7.8&revision=2.2.3.RELEASE&side=provider&timestamp=1608190515329&version=1.0.0\" ]","dubbo.protocols.dubbo.port":"29011","preserved.register.source":"SPRING_CLOUD"},"ipDeleteTimeout":30000,"instanceHeartBeatTimeOut":15000,"instanceHeartBeatInterval":5000}] [15:35:18:084] [INFO] - com.alibaba.cloud.dubbo.autoconfigure.DubboServiceDiscoveryAutoConfiguration.dispatchServiceInstancesChangedEvent(DubboServiceDiscoveryAutoConfiguration.java:171) - The event of the service instances[name : secret-server , size : 1] change is about to be dispatched [15:35:18:087] [INFO] - com.alibaba.nacos.client.naming.core.HostReactor.processServiceJson(HostReactor.java:228) - current ips:(1) service: DEFAULT_GROUP@@secret-server -> [{"instanceId":"192.168.100.2#9011#DEFAULT#DEFAULT_GROUP@@secret-server","ip":"192.168.100.2","port":9011,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@secret-server","metadata":{"dubbo.metadata-service.urls":"[ \"dubbo://192.168.100.2:29011/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=false&application=secret-server&bind.ip=192.168.100.2&bind.port=29011&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&group=secret-server&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=19812&qos.enable=false&release=2.7.8&revision=2.2.3.RELEASE&side=provider&timestamp=1608190515329&version=1.0.0\" ]","dubbo.protocols.dubbo.port":"29011","preserved.register.source":"SPRING_CLOUD"},"ipDeleteTimeout":30000,"instanceHeartBeatTimeOut":15000,"instanceHeartBeatInterval":5000}] [15:35:18:097] [INFO] - com.alibaba.cloud.dubbo.service.DubboMetadataServiceProxy.createProxy(DubboMetadataServiceProxy.java:187) - The metadata of Dubbo service[name : secret-server] is about to be initialized [15:35:18:102] [INFO] - org.apache.dubbo.registry.support.AbstractRegistry.register(AbstractRegistry.java:288) - [DUBBO] Register: consumer://192.168.100.2/org.apache.dubbo.rpc.service.GenericService?application=example-server&category=consumers&check=false&dubbo=2.0.2&generic=true&group=secret-server&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&pid=25360&qos.enable=false&release=2.7.8&side=consumer&sticky=false&timestamp=1608190518100&version=1.0.0, dubbo version: 2.7.8, current host: 192.168.100.2 [15:35:18:102] [INFO] - org.apache.dubbo.registry.support.AbstractRegistry.subscribe(AbstractRegistry.java:313) - [DUBBO] Subscribe: consumer://192.168.100.2/org.apache.dubbo.rpc.service.GenericService?application=example-server&category=providers,configurators,routers&check=false&dubbo=2.0.2&generic=true&group=secret-server&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&pid=25360&qos.enable=false&release=2.7.8&side=consumer&sticky=false&timestamp=1608190518100&version=1.0.0, dubbo version: 2.7.8, current host: 192.168.100.2 [15:35:18:103] [INFO] - org.apache.dubbo.config.ReferenceConfig.createProxy(ReferenceConfig.java:392) - [DUBBO] Refer dubbo service org.apache.dubbo.rpc.service.GenericService from url spring-cloud://192.168.10.4:8848/org.apache.dubbo.registry.RegistryService?anyhost=false&application=example-server&bind.ip=192.168.100.2&bind.port=29011&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=true&group=secret-server&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=25360&qos.enable=false&register.ip=192.168.100.2&release=2.7.8&remote.application=secret-server&revision=2.2.3.RELEASE&side=consumer&sticky=false&timestamp=1608190518100&version=1.0.0, dubbo version: 2.7.8, current host: 192.168.100.2 [15:35:18:112] [INFO] - org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:145) - [DUBBO] Close old netty channel [id: 0x642e871e, L:/192.168.100.2:61113 ! R:/192.168.100.2:29011] on create new netty channel [id: 0x3c6ebc28, L:/192.168.100.2:62816 - R:/192.168.100.2:29011], dubbo version: 2.7.8, current host: 192.168.100.2 [15:35:18:112] [INFO] - org.apache.dubbo.remoting.transport.netty4.NettyClientHandler.channelActive(NettyClientHandler.java:62) - [DUBBO] The connection of /192.168.100.2:62816 -> /192.168.100.2:29011 is established., dubbo version: 2.7.8, current host: 192.168.100.2 **[15:35:18:112] [INFO] - org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:200) - [DUBBO] Successed connect to server /192.168.100.2:29011 from NettyClient 192.168.100.2 using dubbo version 2.7.8, channel is NettyChannel [channel=[id: 0x3c6ebc28, L:/192.168.100.2:62816 - R:/192.168.100.2:29011]], dubbo version: 2.7.8, current host: 192.168.100.2**

最后一句提示: [15:35:18:112] [INFO] - org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:200) - [DUBBO] Successed connect to server /192.168.100.2:29011 from NettyClient 192.168.100.2 using dubbo version 2.7.8, channel is NettyChannel [channel=[id: 0x3c6ebc28, L:/192.168.100.2:62816 - R:/192.168.100.2:29011]], dubbo version: 2.7.8, current host: 192.168.100.2

只是不知道为啥消费者还是认不出提供者已经存活,调用时直接认为服务不存在: [15:36:01:912] [WARN] - org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:116) - [DUBBO] fail-mock: checkKey fail-mock enabled , url : spring-cloud://192.168.10.4:8848/org.apache.dubbo.registry.RegistryService?application=example-server&check=false&cluster=failfast&dubbo=2.0.2&group=WF&init=false&interface=api.IVaultApi&methods=savePwd,saveKey,checkKey,checkPwd&mock=true&pid=25360&qos.enable=false&register.ip=192.168.100.2&release=2.7.8&revision=1.1.0&side=consumer&sticky=false&timestamp=1608182275814&version=1.1.0, dubbo version: 2.7.8, current host: 192.168.100.2 org.apache.dubbo.rpc.RpcException: No provider available from registry 192.168.10.4:8848 for service WF/api.IVaultApi:1.1.0 on consumer 192.168.100.2 use dubbo version 2.7.8, please check status of providers(disabled, not registered or in blacklist).

开发还是要继续,项目还是要继续,所以也只能通过手动下线提供者再重新上线方式,让消费者重新和提供者握手!

liukp0210 commented 3 years ago

按照上面的例子我想本地模拟一下,但是好像不能复现

pangshuqiang commented 3 years ago

但是好像不能复现

我说说一下开发组的环境: 1、在虚拟机192.168.10.4(系统CentOS8.1)下的Docker里拉取并部署Nacos注册中心,并映射出8848端口供开发组用; 2、开发组成员在自己电脑上基于SpringCloud、SpringcloudAlibaba、Dubbo架构中开发微服务提供者A项目和消费者B项目; 3、比如在我的电脑192.168.100.2上通过IDEA开发工具先后启动提供者A项目和B项目,此时B项目消费者可以正常调用A项目的Dubbo接口; 4、问题复现: 重启提供者A项目, 然后B项目会出现上述提到的消息 https://github.com/alibaba/spring-cloud-alibaba/issues/1805#issuecomment-747277036, 但是即便A项目启动完成,并在Nacos里也能看到是已经上线激活的, 然而B项目还是无法请求A项目的Dubbo接口,提示如下: org.apache.dubbo.rpc.RpcException: No provider available from registry 192.168.10.4:8848 for service WF/express.api.IExpressApi:1.1.0 on consumer 192.168.100.2 use dubbo version 2.7.8, please check status of providers(disabled, not registered or in blacklist)., dubbo version: 2.7.8, current host: 192.168.100.2 org.apache.dubbo.rpc.RpcException: No provider available from registry 192.168.10.4:8848 for service WF/express.api.IExpressApi:1.1.0 on consumer 192.168.100.2 use dubbo version 2.7.8, please check status of providers(disabled, not registered or in blacklist).`

服务: Nacos 1.4.0(部署在Docker) Sentinel 1.8.0(部署在Docker) 项目: SpringBoot 2.3.4 + SpringCloud Hoxton.SR8 + spring-cloud-alibaba 2.2.2.RELEASE (附带的Dubbo版本为 2.7.8)

pangshuqiang commented 3 years ago

已经困惑开发组人员很久的问题,当SpringCloud两个微服务相互成为提供者、消费者时,似乎无解!之前开发组把nacos://改成spring-cloud://连接前缀后得1、2天,之后该问题又再重现。希望早点有新版解决该问题。

亲,我这边建议您弃坑。

我又来刷屏了!

开发组确定了问题所在,之前项目用的版本是: SpringBoot 2.3.0 + SpringCloud Hoxton.SR4 + spring-cloud-alibaba 2.2.1.RELEASE (附带的Dubbo版本为 2.7.6) 提供者服务重启之后,消费者会收到: `` 即不会出现消费者找不到重启后的提供者Dubbo接口服务;

再升级到新版: SpringBoot 2.3.4 + SpringCloud Hoxton.SR8 + spring-cloud-alibaba 2.2.2.RELEASE (附带的Dubbo版本为 2.7.8) 提示Service和Reference,该类已经过时,启用DubboService和DubboReference代替, Issue: 然后该版本升级后,引发重启之后找不到提供者的问题,各位看官,问题到此结束!

lgp547 commented 3 years ago

k8s环境下,用的是文档推荐的最新毕业版本,nacos是1.3.2 还是出现了这个错误, 并一直在重复打印(原因就是172服务我以及重启下线了,但54服务没有即使的更新)

Spring Cloud Version | Spring Cloud Alibaba Version | Spring Boot Version Spring Cloud Hoxton.SR8 | 2.2.3.RELEASE | 2.3.2.RELEASE

2020-12-28 17:44:06.811 [dubbo-client-idleCheck-thread-1] [] ERROR org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask - [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/10.xx.xx.54:55378 -> /10.xx.xx.172:20880]], dubbo version: 2.7.8, current host: 10.xx.xx.54 org.apache.dubbo.remoting.RemotingException: client(url: dubbo://10.xx.xx.172:20880/com.alibaba.cloud.dubbo.service.DubboMetadataService?anyhost=true&application=question-service&bind.ip=10.xx.xx.172&bind.port=20880&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=true&group=privilege-service&heartbeat=60000&interface=com.alibaba.cloud.dubbo.service.DubboMetadataService&methods=getAllServiceKeys,getServiceRestMetadata,getExportedURLs,getAllExportedURLs&pid=376&qos.enable=false&register.ip=10.xx.xx.54&release=2.7.8&remote.application=privilege-service&revision=2.2.3.RELEASE&side=consumer&sticky=false&timeout=60000&timestamp=1608692297155&version=1.0.0) failed to connect to server /10.xx.xx.172:20880 client-side timeout 3000ms (elapsed: 3001ms) from netty client 10.xx.xx.54 using dubbo version 2.7.8 at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:174) at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:191) at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:247) at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:166) at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49) at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87) at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:648) at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:727) at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:449) at java.lang.Thread.run(Thread.java:748)

能不能来个版本解决下呀。


还有一个问题,操作发版,导致服务启动完后,服务A不能调用服务B,一直在报错 org.apache.dubbo.rpc.RpcException: Failed to invoke the method getUserNameById in the service com.xxx.privilege.application.PrivilegeApplication. Tried 3 times of the providers ....

cxdhefei commented 3 years ago

已经困惑开发组人员很久的问题,当SpringCloud两个微服务相互成为提供者、消费者时,似乎无解!之前开发组把nacos://改成spring-cloud://连接前缀后得1、2天,之后该问题又再重现。希望早点有新版解决该问题。

亲,我这边建议您弃坑。

我又来刷屏了!

开发组确定了问题所在,之前项目用的版本是: SpringBoot 2.3.0 + SpringCloud Hoxton.SR4 + spring-cloud-alibaba 2.2.1.RELEASE (附带的Dubbo版本为 2.7.6) 提供者服务重启之后,消费者会收到: `` 即不会出现消费者找不到重启后的提供者Dubbo接口服务;

再升级到新版: SpringBoot 2.3.4 + SpringCloud Hoxton.SR8 + spring-cloud-alibaba 2.2.2.RELEASE (附带的Dubbo版本为 2.7.8) 提示Service和Reference,该类已经过时,启用DubboService和DubboReference代替, Issue: 然后该版本升级后,引发重启之后找不到提供者的问题,各位看官,问题到此结束!

我用的这个版本,依然存在消费者找不到提供者问题! nacos-server: 1.3.1 spring-boot: 2.2.6.RELEASE spring-cloud-alibaba: 2.2.1.RELEASE (dubbo: 2.7.6、nacos-client: 1.2.1) spring-cloud: Hoxton.SR4

cxdhefei commented 3 years ago

@pangshuqiang
微信号 coding4joy

mostcool commented 3 years ago

@yuhuangbin @theonefx 这个问题官方有计划解决吗?

theonefx commented 3 years ago

We are dealing with this issue

theonefx commented 3 years ago

由于 spring cloud 对服务注册的概念和 dubbo 是不太一样的。 目前的想法是,通过两次注册的方式来实现 dubbo 和 sc 注册的适配:

theonefx commented 3 years ago

已经解决了,辛苦各位看下

theonefx commented 3 years ago

already fixed

caichy commented 3 years ago

2.2.4.RELEASE 已经发布了吗?

xyx007 commented 3 years ago

关注好久,终于给解决了

caichy commented 3 years ago

还是有问题啊,今天用2.2.4测试了一下。 环境: dubbo 2.7.8 springboot 2.2.5 spring-cloud-dependencies Hoxton.SR8

@theonefx 问题一:新扩的Provider节点,在不重启Consumer的时候无法提供服务 Provider先启动节点P1,启动Consumer节点C1,访问正常; 启动Provider P2,注册nacos成功,但是在C1上不能访问P2,DubboMetadataServiceInvocationHandler 53: Failed to invoke the method getExportedURLs in the service org.apache.dubbo.rpc.service.GenericService. Tried 3 times of the providers

问题二:Provider节点关机下线后,在Consumer端ReconnectTimerTask依然的检测已经下线的节点

在nacos操作服务上下线,不起作用

theonefx commented 3 years ago

还是有问题啊,今天用2.2.4测试了一下。 环境: dubbo 2.7.8 springboot 2.2.5 spring-cloud-dependencies Hoxton.SR8

@theonefx 问题一:新扩的Provider节点,在不重启Consumer的时候无法提供服务 Provider先启动节点P1,启动Consumer节点C1,访问正常; 启动Provider P2,注册nacos成功,但是在C1上不能访问P2,DubboMetadataServiceInvocationHandler 53: Failed to invoke the method getExportedURLs in the service org.apache.dubbo.rpc.service.GenericService. Tried 3 times of the providers

问题二:Provider节点关机下线后,在Consumer端ReconnectTimerTask依然的检测已经下线的节点

在nacos操作服务上下线,不起作用

caichy commented 3 years ago

相同appname的provider启动一个新的实例

theonefx commented 3 years ago

相同appname的provider启动一个新的实例

收到,稍等片刻,我确认一下

caichy commented 3 years ago

DubboMetadataServiceInvocationHandler 53: Failed to invoke
上面的异常信息可以先不关注,我本地debug的时候偶发了几次,不确定是否有影响

theonefx commented 3 years ago

我试了半天,都没没有复现出来。 不过在本机尝试复现的时候,需要注意 dubbo 的协议端口注意分开,不然有可能冲突造成服务端数据有问题

theonefx commented 3 years ago

@caichy 如果你是本机测试,dubbo.protocol.port 这个一定要提前分配,不要用 -1

caichy commented 3 years ago

我试了半天,都没没有复现出来。 不过在本机尝试复现的时候,需要注意 dubbo 的协议端口注意分开,不然有可能冲突造成服务端数据有问题

是测试环境服务器上测出来的,端口确实是配置的-1,3台机器部署provider,一台部署consumer。 晚一点我再截一些图上来

caichy commented 3 years ago

我试了半天,都没没有复现出来。 不过在本机尝试复现的时候,需要注意 dubbo 的协议端口注意分开,不然有可能冲突造成服务端数据有问题

@theonefx dubbo配置如下,服务端和消费端一样

registry: address: spring-cloud://nacos.test.xxx.cn:8080 consumer: timeout: 10000 check: true protocol: name: dubbo port: -1

两个机器,IP为121和122,分别部署两个Provider,在122上部署Consumer,访问http://122:8800/dubbo/get/1 来测试dubbo接口是否可用,以下方便描述我用Provider-122,Provider-121,Consumer-122来代表实例

第一步,先启动Provider-122和Consumer-122实例,访问/dubbo/get/1正常。 image image

第二步,再启动Provider-121,Provider-122和Consumer-122都会收到InstancesChangeEvent,nacos也注册成功 image 访问/dubbo/get/1正常,dubbo请求只会打到第一次启动的122(如果先启动两个Provider,load balance是正常的)

第三步,停止Provider-122,用的kill命令,Provider-121和Consumer-122都会收到InstancesChangeEvent,nacos取消注册成功 image 访问/dubbo/get/1报错,提示Tried 3 times of the providers [172.24.30.122:20881] (1/1); ReconnectTimerTask 一直在检测已经下线的Provider-122 image

caichy commented 3 years ago

在8台机器(3个不同网段,都是虚拟机)上测了不同的组合,有些机器上的Consumer能刷新到部分新启动的Provider(当Provider下线,ReconnectTimerTask也不会报异常),但没有哪一个是完全正常的;

可以确定的是重启Consumer就能消费到所有的Provider

tianxinjs commented 3 years ago

确实还有问题,No provider available from registry localhost:9090 for service

fangjian0423 commented 3 years ago

2.2.5.RELEASE & 2.1.4.RELEASE & 2.0.4.RELEASE have been solved this problem.

Please open a new issue if you still have this problem.

winterallen commented 3 years ago

2.2.5.RELEASE和2.1.4.RELEASE和2.0.4.RELEASE已解决了此问题。

如果您仍然遇到此问题,请打开一个新问题。

2.2.5.RELEASE问题依旧存在

lizhuquan0769 commented 3 years ago

我用spring cloud alibaba 2.2.5.RELEASE,也一直是这个问题

BuYi-Feng commented 3 years ago

我用spring cloud alibaba 2.2.5.RELEASE,也一直是这个问题

你可以看下2079,这已经成为一个官方暂时无法解决的BUG了

lgh731 commented 3 years ago

我升级到最新版本,但是问题还是存在,是还没有解决吗? spring cloud :2020.0.0 spring cloud alibaba :2021.1 Dubbo:2.7.8 nacos:1.4.1

theonefx commented 3 years ago

我升级到最新版本,但是问题还是存在,是还没有解决吗? spring cloud :2020.0.0 spring cloud alibaba :2021.1 Dubbo:2.7.8 nacos:1.4.1

抱歉,之前的修复确实有一些问题,导致解决的不够彻底。 我们已经用了新的方案来解决这个问题了,敬请期待2.2.6.RELEASE版本。或者如果想抢先体验的话,可以使用2.2.6-bugfix5-SNAPSHOT

theonefx commented 3 years ago

我用spring cloud alibaba 2.2.5.RELEASE,也一直是这个问题

你可以看下2079,这已经成为一个官方暂时无法解决的BUG了

抱歉,之前的修复确实有一些问题,导致解决的不够彻底。 我们已经用了新的方案来解决这个问题了,敬请期待2.2.6.RELEASE版本。或者如果想抢先体验的话,可以使用2.2.6-bugfix5-SNAPSHOT

theonefx commented 3 years ago

我用spring cloud alibaba 2.2.5.RELEASE,也一直是这个问题

抱歉,之前的修复确实有一些问题,导致解决的不够彻底。 我们已经用了新的方案来解决这个问题了,敬请期待2.2.6.RELEASE版本。或者如果想抢先体验的话,可以使用2.2.6-bugfix5-SNAPSHOT

lgh731 commented 3 years ago

2.2.6.RELEASE大概什么时候能发布

theonefx commented 3 years ago

2.2.6.RELEASE大概什么时候能发布

近期就会发布,原计划是这个月发布的,因为测试可能稍微有些delay