Closed supermareo closed 3 years ago
I can't reproduce your problem.
Please check nacos naming-server and naming-event logs to check whether any error or exception.
And you can use curl -X PUT 'ip:port/nacos/v1/ns/operator/logs?logName=naming-event&logLevel=DEBUG'
and check naming-event log to make sure no heartbeat from this service and instance.
I can't reproduce your problem.
Please check nacos naming-server and naming-event logs to check whether any error or exception.
And you can use
curl -X PUT 'ip:port/nacos/v1/ns/operator/logs?logName=naming-event&logLevel=DEBUG'
and check naming-event log to make sure no heartbeat from this service and instance.
找到一条相关的WARN级别日志,没有Error与Exception相关日志
2020-10-16 16:14:23,128 WARN cluster: DEFAULT not found, ip: {"app":"unknown","clusterName":"DEFAULT","enabled":true,"ephemeral":true,"healthy":true,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000,"instanceId":"172.25.240.12#10010#DEFAULT#DEFAULT_GROUP@@konke-iot-cloud-api","instanceIdGenerator":"","ip":"172.25.240.12","ipDeleteTimeout":30000,"lastBeat":1602836063105,"marked":false,"metadata":{"appVersion":"v1.3.0.RELEASE","appDesc":"设备云平台-API服务","preserved.register.source":"SPRING_CLOUD","appStartAt":"2020-10-16 11:56:30.695"},"port":10010,"serviceName":"DEFAULT_GROUP@@konke-iot-cloud-api","tenant":"","weight":1.0}, will create new cluster with default configuration.
没有异常日志
接口报错,见下图
name-server与naming-event今天的完整日志见附件
curl -X PUT 'ip:port/nacos/v1/ns/operator/log?logName=naming-event&logLevel=DEBUG'
Sorry I add more s
in api.
Please use this api open debug log for naming-event and check whether receive heartbeat of this instance.
Other way to check whether receive heartbeat of this instance is use curl -X GET 'ip:port/nacos/v1/ns/catalog/instances?serviceName=&pageSize=10&pageNo=1&namespaceId=
and check the lastBeat changes?
@supermareo Any process?
@KomachiSion 暂时复现不出来了,等下次出现的时候我再排查下
那我先关闭issue了。 复现的时候记得打开日志或者用openAPI来确认心跳时间,最好能够知道如何复现问题。
@KomachiSion 你好,这个问题又出现了,还是172.25.240.12这个ip,这段时间NacosServer没有重启过。目前我电脑的ip已经变更了,联系运维查看当前172.25.240.12这个ip被分配给了一台手机,所以可以确定并没有启动Java程序。
我按照你上面的指导,修改了日志的级别,curl -X PUT 'http://172.25.240.37:8848/nacos/v1/ns/operator/log?logName=naming-event&logLevel=DEBUG'
,过滤了一下日志(tail -f naming-event.log | grep 172.25.240.12
),打印如下:
2020-10-27 15:46:44,722 DEBUG [CLIENT-BEAT] processing beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10011,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:44,722 DEBUG [CLIENT-BEAT] refresh beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10011,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:45,295 DEBUG [CLIENT-BEAT] processing beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10010,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:45,295 DEBUG [CLIENT-BEAT] refresh beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10010,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:49,734 DEBUG [CLIENT-BEAT] processing beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10011,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:49,734 DEBUG [CLIENT-BEAT] refresh beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10011,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:50,297 DEBUG [CLIENT-BEAT] processing beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10010,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
2020-10-27 15:46:50,297 DEBUG [CLIENT-BEAT] refresh beat: {"load":0.0,"cpu":0.0,"rt":0.0,"qps":0.0,"mem":0.0,"port":10010,"ip":"172.25.240.12","cluster":"DEFAULT","weight":0.0,"ephemeral":true}
我不太清楚这段日志的含义,是否是指我在172.25.240.12
这个ip上启动的服务在定时上报心跳。所以我按照我的思路,进行了一些排查。
首先是再次确认运行NacosServer的服务器是否与172.25.240.12
这个ip有连接,确认并没有连接。
在NacosServer所在服务器上通过iptables禁用172.25.240.12
这个ip
再次查看日志,仍然和之前一样有心跳日志,并且界面上仍有这个服务。理论上这个时候两个服务之间的连接应该断开了。
不知道这些信息能不能帮助定位,如果有其它需要提供的内容,麻烦告诉我,谢谢。
1.2.1遇到同样的问题
focus
同样的问题 日志: failed to delete ip automatically, ip: {"instanceId":"10.32.0.188#8080#DEFAULT#DEFAULT_GROUP@@saas-center-identity","ip":"10.32.0.188","port":8080,"weight":1.0,"healthy":false,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@saas-center-identity","metadata":{"preserved.register.source":"SPRING_CLOUD"},"lastBeat":1706581703587,"marked":false,"app":"DEFAULT","instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000,"ipDeleteTimeout":30000}, error: {}
nacos集群一共3个实例: nacos-server-thirdapp-nacos-server-ha-extdb-0 nacos-server-thirdapp-nacos-server-ha-extdb-1 nacos-server-thirdapp-nacos-server-ha-extdb-2
其中nacos-server-thirdapp-nacos-server-ha-extdb-2与另外两台实例断开了(但不是网络导致的,具体看下面)
nacos-server-thirdapp-nacos-server-ha-extdb-2的部分日志如下:
748511:2024-01-30 10:49:47,256 WARN [STATUS-SYNCHRONIZE] failed to request serviceStatus, remote server: nacos-server-thirdapp-nacos-server-ha-extdb-0.nacos-hs.uat.svc.cluster.local:8848
748534:2024-01-30 10:49:47,256 WARN [STATUS-SYNCHRONIZE] failed to request serviceStatus, remote server: nacos-server-thirdapp-nacos-server-ha-extdb-1.nacos-hs.uat.svc.cluster.local:8848
2024-01-30 10:49:48,846 ERROR [IP-DEAD] failed to delete ip automatically, ip: {"instanceId":"10.32.1.227#8080#DEFAULT#DEFAULT_GROUP@@saas-center-user","ip":"10.32.1.227","port":8080,"weight":1.0,"healthy":false,"enabled":false,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@saas-center-user","metadata":{"preserved.register.source":"SPRING_CLOUD"},"lastBeat":1706495234710,"marked":false,"app":"DEFAULT","instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000,"ipDeleteTimeout":30000}, error: {}
java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
at org.apache.http.util.Asserts.check(Asserts.java:46)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:75)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:108)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClient.execute(CloseableHttpAsyncClient.java:92)
at com.alibaba.nacos.common.http.client.request.DefaultAsyncHttpClientRequest.execute(DefaultAsyncHttpClientRequest.java:52)
at com.alibaba.nacos.common.http.client.NacosAsyncRestTemplate.execute(NacosAsyncRestTemplate.java:364)
at com.alibaba.nacos.common.http.client.NacosAsyncRestTemplate.delete(NacosAsyncRestTemplate.java:105)
at com.alibaba.nacos.naming.misc.HttpClient.asyncHttpRequest(HttpClient.java:195)
at com.alibaba.nacos.naming.misc.HttpClient.asyncHttpDelete(HttpClient.java:159)
at com.alibaba.nacos.naming.healthcheck.ClientBeatCheckTask.deleteIp(ClientBeatCheckTask.java:144)
at com.alibaba.nacos.naming.healthcheck.ClientBeatCheckTask.run(ClientBeatCheckTask.java:122)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
nacos-server-thirdapp-nacos-server-ha-extdb-2
调用其他实例nacos/v1/ns/service/status接口的时候报错了, 但是抓取参数,通过curl调用是成功的. 最终找到错误原因如下:
Request cannot be executed; I/O reactor status: STOPPED的错误原因 https://blog.csdn.net/zhaopeng_yu/article/details/118763153
@KomachiSion
我猜测,nacos页面上, 服务健康状态一会儿true,一会儿false,也是这个导致的. 因为发现实例挂掉的nacos节点,无法把实例状态同步给其他nacos实例.
Describe the bug 测试服务器使用了 单机版 NacosServer 1.2.1,本地电脑(Win10)上启动Java程序并注册到Nacos的服务列表中(见下图红框),测试完成后,停止该Java程序,后来发现Nacos控制台的服务详情中该服务仍然存在,且是上线状态。后升级NacosServer版本至最新 1.3.2, 问题仍存在。
首先怀疑是Java程序没有真正关掉,通过任务管理器与命令查看,确认本地电脑上并无任何Java进程在运行。为了确保本地Java程序确实被关闭,重启本地电脑。重启后查看NacosServer控制台,该服务仍然存在且是上线状态。
之后检查部署NacosServer的Centos服务器,过滤连接到NacosServer的IP信息。由下图可以看出,172.25.240.12(本地电脑的IP)并没有到NacosServer的连接。
Expected behavior Java程序停止后,服务详情中应该不再有该服务的信息
Acutally behavior Java程序停止后,该服务仍然在服务详情中,且无法删除。重启NacosServer后仍然存在。
How to Reproduce 该问题非必现,操作流程在上面的 Describe the bug 部分已描述。
Desktop (please complete the following information):
Additional context Add any other context about the problem here.