alibaba / nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
https://nacos.io
Apache License 2.0
30.33k stars 12.85k forks source link

SpringCloud使用1.4.2连接2.0.1不稳定 #6756

Closed WindyGao closed 3 years ago

WindyGao commented 3 years ago

SpringCloud使用nacos 1.4.2客户端连接k8s部署的nacos2.0.1服务不稳定,经常会暴露出如下错误

[08:26:22:22:40:204] ERROR [com.alibaba.nacos.naming.beat.sender] [TID: N/A] [] [] @@naming@@ | request: /nacos/v1/ns/instance/beat failed, servers: [http://cdmp-nacos.youdao.com:80], code: 500, msg: server is DOWNnow, detailed error message: Optional[Distro protocol is not initialized] [08:26:22:22:40:205] ERROR [com.alibaba.nacos.naming.beat.sender] [TID: N/A] [] [] @@naming@@ | [CLIENT-BEAT] failed to send beat: {"port":8800,"ip":"**.**.**.**","weight":1.0,"serviceName":"cdmp@@dohko-base","cluster":"DEFAULT","metadata":{"management.endpoints.web.base-path":"/monitor","preserved.register.source":"SPRING_CLOUD","gRPC_port":"8801"},"scheduled":false,"period":5000,"stopped":false}, code: 500, msg: failed to req API:/nacos/v1/ns/instance/beat after all servers([http://cdmp-nacos.youdao.com:80]) tried: ErrCode:503, ErrMsg:server is DOWNnow, detailed error message: Optional[Distro protocol is not initialized] [08:26:22:22:45:209] ERROR [com.alibaba.nacos.naming.beat.sender] [TID: N/A] [] [] @@naming@@ | [NA] failed to request com.alibaba.nacos.api.exception.NacosException: server is DOWNnow, detailed error message: Optional[Distro protocol is not initialized]

WindyGao commented 3 years ago

有大佬能遇到过嘛

WindyGao commented 3 years ago

[CLIENT-BEAT] failed to send beat: {"port":8800,"ip":"**.**.**.**","weight":1.0,"serviceName":"cdmp@@dohko-base","cluster":"DEFAULT","metadata":{"management.endpoints.web.base-path":"/monitor","preserved.register.source":"SPRING_CLOUD","gRPC_port":"8801"},"scheduled":false,"period":5000,"stopped":false}, code: 500, msg: failed to req API:/nacos/v1/ns/instance/beat after all servers([http://cdmp-nacos.youdao.com:80]) tried: ErrCode:503, ErrMsg:server is DOWNnow, detailed error message: Optional[Distro protocol is not initialized]

MajorHe1 commented 3 years ago

refer to #5920 if it does not work, refer to #6072

WindyGao commented 3 years ago

6072

我的protocol-distro.log报错

2021-08-26 22:21:14,713 ERROR Fail to refresh route configuration for group : naming_persistent_service_v2, status is : Status[UNKNOWN<-1>: handleRequest internal error]

2021-08-26 22:21:16,013 ERROR Fail to refresh leader for group : naming_persistent_service, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848]

2021-08-26 22:21:16,013 ERROR Fail to refresh route configuration for group : naming_persistent_service, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848]

2021-08-26 22:21:16,516 ERROR Fail to refresh leader for group : naming_instance_metadata, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848, Unknown leader]

2021-08-26 22:21:16,517 ERROR Fail to refresh route configuration for group : naming_instance_metadata, status is : Status[UNKNOWN<-1>: handleRequest internal error]

2021-08-26 22:21:17,897 ERROR Fail to refresh leader for group : naming_service_metadata, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848, Unknown leader]

2021-08-26 22:21:17,897 ERROR Fail to refresh route configuration for group : naming_service_metadata, status is : Status[UNKNOWN<-1>: handleRequest internal error]

2021-08-26 22:21:22,206 ERROR Fail to refresh leader for group : naming_persistent_service, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848]

2021-08-26 22:21:22,207 ERROR Fail to refresh route configuration for group : naming_persistent_service, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848]

2021-08-26 22:21:22,259 ERROR Fail to refresh leader for group : naming_persistent_service_v2, status is : Status[UNKNOWN<-1>: Fail to init channel to nacos-0.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:7848, Unknown leader]

2021-08-26 22:21:22,259 ERROR Fail to refresh route configuration for group : naming_persistent_service_v2, status is : Status[UNKNOWN<-1>: handleRequest internal error]
MajorHe1 commented 3 years ago

看起来是访问 域名+7848端口失败了,7848端口就是进行raft通信的。 先排查一下端口网络是否通畅,再确认一下Nginx配置是否有问题,参照6072那个issue里面说的

WindyGao commented 3 years ago

看了日志

nacos-0

@MajorHe1 您好,我看了下,这个日志是nacos-0那个时间点重启了,但是为什么会重启呀,而且重启的时候就spring服务就报错

[http://cdmp-nacos.youdao.com:80], code: 500, msg: server is DOWNnow, detailed error message: Optional[Distro protocol is not initialized]

这个有什么方式避免吗

WindyGao commented 3 years ago

@MajorHe1 我发现2021-08-26 22:19的时候nacos-2就出错了,然后nacos-0后面自己启动了 image 这份日志截图您能看到有用的信息吗

WindyGao commented 3 years ago

protocol-distro.log.2021-08-26.0日志

这个nacos节点在2021-08-26 22:19时挂掉了,这是protocol-distro.log,能帮忙看看是什么原因吗


2021-08-26 22:21:55,460 INFO [DISTRO-INIT] waiting distro data storage register...

2021-08-26 22:21:56,461 INFO [DISTRO-INIT] waiting distro data storage register...

2021-08-26 22:21:57,461 INFO [DISTRO-INIT] waiting distro data storage register...

2021-08-26 22:21:58,462 INFO [DISTRO-INIT] load snapshot com.alibaba.nacos.naming.iplist. from nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:21:59,373 INFO [DISTRO-INIT] load snapshot com.alibaba.nacos.naming.iplist. from nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:21:59,453 INFO [DISTRO-INIT] load snapshot Nacos:Naming:v2:ClientData from nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:21:59,455 ERROR [DISTRO-INIT] load snapshot Nacos:Naming:v2:ClientData from nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 failed.

com.alibaba.nacos.core.distributed.distro.exception.DistroException: [DISTRO-EXCEPTION][DISTRO-FAILED] Get distro snapshot failed! 
        at com.alibaba.nacos.naming.consistency.ephemeral.distro.v2.DistroClientTransportAgent.getDatumSnapshot(DistroClientTransportAgent.java:184)
        at com.alibaba.nacos.core.distributed.distro.task.load.DistroLoadDataTask.loadAllDataSnapshotFromRemote(DistroLoadDataTask.java:103)
        at com.alibaba.nacos.core.distributed.distro.task.load.DistroLoadDataTask.load(DistroLoadDataTask.java:87)
        at com.alibaba.nacos.core.distributed.distro.task.load.DistroLoadDataTask.run(DistroLoadDataTask.java:63)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.alibaba.nacos.api.exception.NacosException: No rpc client related to member: Member{ip='nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com', port=8848, state=UP, extendInfo={lastRefreshTime=1629987716877, naming={ip=nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848, heartbeatDueMs=4808, term=-1, leaderDueMs=3932, state=FOLLOWER}, raftPort=7848}}
        at com.alibaba.nacos.core.cluster.remote.ClusterRpcClientProxy.sendRequest(ClusterRpcClientProxy.java:176)
        at com.alibaba.nacos.core.cluster.remote.ClusterRpcClientProxy.sendRequest(ClusterRpcClientProxy.java:159)
        at com.alibaba.nacos.naming.consistency.ephemeral.distro.v2.DistroClientTransportAgent.getDatumSnapshot(DistroClientTransportAgent.java:175)
        ... 10 common frames omitted
2021-08-26 22:21:59,455 INFO [DISTRO-INIT] load snapshot Nacos:Naming:v2:ClientData from nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:21:59,455 ERROR [DISTRO-INIT] load snapshot Nacos:Naming:v2:ClientData from nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 failed.

com.alibaba.nacos.core.distributed.distro.exception.DistroException: [DISTRO-EXCEPTION][DISTRO-FAILED] Get distro snapshot failed! 
        at com.alibaba.nacos.naming.consistency.ephemeral.distro.v2.DistroClientTransportAgent.getDatumSnapshot(DistroClientTransportAgent.java:184)
        at com.alibaba.nacos.core.distributed.distro.task.load.DistroLoadDataTask.loadAllDataSnapshotFromRemote(DistroLoadDataTask.java:103)
        at com.alibaba.nacos.core.distributed.distro.task.load.DistroLoadDataTask.load(DistroLoadDataTask.java:87)
        at com.alibaba.nacos.core.distributed.distro.task.load.DistroLoadDataTask.run(DistroLoadDataTask.java:63)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.alibaba.nacos.api.exception.NacosException: No rpc client related to member: Member{ip='nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com', port=8848, state=UP, extendInfo={lastRefreshTime=1629987718956, naming={ip=nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848, heartbeatDueMs=1500, term=-1, leaderDueMs=7835, state=FOLLOWER}, raftPort=7848}}
        at com.alibaba.nacos.core.cluster.remote.ClusterRpcClientProxy.sendRequest(ClusterRpcClientProxy.java:176)
        at com.alibaba.nacos.core.cluster.remote.ClusterRpcClientProxy.sendRequest(ClusterRpcClientProxy.java:159)
        at com.alibaba.nacos.naming.consistency.ephemeral.distro.v2.DistroClientTransportAgent.getDatumSnapshot(DistroClientTransportAgent.java:175)
        ... 10 common frames omitted
2021-08-26 22:22:00,462 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:05,555 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:05,955 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:05,956 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-user', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:06,156 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway, com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:22:06,454 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway, com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:22:06,454 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway, com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:22:06,557 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway, com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:22:06,657 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.xiaobanke##xiaobanke@@xbk-web-ares-pre', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:06,657 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-user', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:06,756 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-user', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:08,356 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:08,655 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:22:08,659 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:22:08,659 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:22:08,661 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.chess##chess@@chess-gateway]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:22:09,273 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-user', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:10,556 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:15,557 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:20,557 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:25,558 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:28,393 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-user', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:30,558 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:35,559 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:40,559 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:45,560 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:50,560 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:55,117 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-user', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:22:55,561 WARN data storage DistroClientDataProcessor has not finished initial step, do not send verify data

2021-08-26 22:22:59,456 INFO [DISTRO-INIT] load snapshot Nacos:Naming:v2:ClientData from nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:22:59,466 INFO [DISTRO-INIT] load snapshot Nacos:Naming:v2:ClientData from nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:22:59,466 INFO [DISTRO-INIT] load snapshot data success

2021-08-26 22:29:48,013 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:30:01,324 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:30:26,555 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:30:33,562 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:30:57,786 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:30:59,388 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-base', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:32:50,890 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-teacher', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:32:56,005 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:32:56,007 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:32:56,007 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:32:56,009 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:33:00,141 INFO [DISTRO] Receive distro data type: null, key: DistroKey{resourceKey='com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-practice', resourceType='com.alibaba.nacos.naming.iplist.', targetServer='null'}

2021-08-26 22:33:00,809 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848

2021-08-26 22:33:00,811 INFO [DISTRO-END] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-1.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848 result: true

2021-08-26 22:33:00,811 INFO [DISTRO-START] DistroSyncChangeTask for DistroHttpCombinedKey-0{actualResourceTypes=[com.alibaba.nacos.naming.iplist.ephemeral.cdmp##cdmp@@dohko-common]} to nacos-2.cdmp-nacos-prod.course-data-infra.svc.cluster6.nbj03.corp.yodao.com:8848
MajorHe1 commented 3 years ago

但是为什么会重启呀

这个我就不知道了,nacos是不会自动重启的,你应该排查一下是不是容器被重启了,容器被干掉了然后k8s自动调度重启

这份日志截图您能看到有用的信息吗

看起来是jraft通信异常,无法选主,但个人认为这个是错误的网络配置导致的结果,通常就是网络不通,你是否引入了Nginx,如果是,通常是Nginx配置的有问题。仔细去看一下 #6072 吧。 image

你要先明确问题出现的场景,Optional[Distro protocol is not initialized]这个错误,是集群刚搭建的时候出现的,还是运行过程中出现的,还是因为你说的重启的状况导致的。 如果是集群刚搭建的时候出现的,按 #5920 说的先尝试一下。 如果集群无法搭建成功,可以把 {nacos_home}/data/移除,然后重启试一下。

WindyGao commented 3 years ago

但是为什么会重启呀

这个我就不知道了,nacos是不会自动重启的,你应该排查一下是不是容器被重启了,容器被干掉了然后k8s自动调度重启

这份日志截图您能看到有用的信息吗

看起来是jraft通信异常,无法选主,但个人认为这个是错误的网络配置导致的结果,通常就是网络不通,你是否引入了Nginx,如果是,通常是Nginx配置的有问题。仔细去看一下 #6072 吧。 image

你要先明确问题出现的场景,Optional[Distro protocol is not initialized]这个错误,是集群刚搭建的时候出现的,还是运行过程中出现的,还是因为你说的重启的状况导致的。 如果是集群刚搭建的时候出现的,按 #5920 说的先尝试一下。 如果集群无法搭建成功,可以把 {nacos_home}/data/移除,然后重启试一下。

嗯,我的这个是搭建好了,然后用nginx代理的,运行了很久都没问题,隔一段时间就出现spring不能访问nacos了,然后盯这个日志看,有一个nacos-0它挂了

WindyGao commented 3 years ago

但是这个nginx代理的问题是对于client访问的情况讲的吧,我看现在它server端直接挂了

WindyGao commented 3 years ago

而且在spring项目里,我只访问8848的吧,我把k8s中3个nacos pod的统一DNS ip用nginx搞个http域名代理下,然后放到spring中去配好服务发现地址,如下截图 image image 即便这种,也需要在nginx中代理其他的端口嘛,我觉得不用了吧

MajorHe1 commented 3 years ago

运行了很久都没问题,隔一段时间就出现spring不能访问nacos了,然后盯这个日志看,有一个nacos-0它挂了

明确一下,是nacos的一台server挂了,才导致spring不能访问对不对?nacos集群正常状态的时候,spring的服务访问没有问题? 所以现在问题变成了,挂了的nacos-server节点自动重启失败对不对?

k8s中3个nacos pod的统一DNS ip用nginx搞个http域名代理下

nacos-server之间的互相通信是通过ip还是域名的?nacos-server之间的通信会使用 7848端口(raft通信)和9849端口(grpc通信),https://nacos.io/zh-cn/docs/2.0.0-compatibility.html

WindyGao commented 3 years ago

运行了很久都没问题,隔一段时间就出现spring不能访问nacos了,然后盯这个日志看,有一个nacos-0它挂了

明确一下,是nacos的一台server挂了,才导致spring不能访问对不对?nacos集群正常状态的时候,spring的服务访问没有问题? 所以现在问题变成了,挂了的nacos-server节点自动重启失败对不对?

k8s中3个nacos pod的统一DNS ip用nginx搞个http域名代理下

nacos-server之间的互相通信是通过ip还是域名的?nacos-server之间的通信会使用 7848端口(raft通信)和9849端口(grpc通信),https://nacos.io/zh-cn/docs/2.0.0-compatibility.html

cluster-conf中是这样子的 image

WindyGao commented 3 years ago

我觉得应该是hostname

MajorHe1 commented 3 years ago

我觉得应该是hostname

所以nacos-server之间的通信是通过域名访问的?域名解析谁做的?Nginx?开放了7848和9849端口没有?

KomachiSion commented 3 years ago

从讨论来看, 应该是nacos容器被杀掉导致的瞬间无法连接,nacos-client会自动做负载均衡,路由到其他存活的节点,问题不大。

建议多排查下为什么nacos容器会被杀掉,是否是内存分配不合理导致被OOMKiller了。

由于issue作者已经很长时间没有回复了, 可能已经找到问题并解决了。先关闭issue。