alibaba / nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
https://nacos.io
Apache License 2.0
30.01k stars 12.8k forks source link

集群环境下,服务实例间的元数据经过调用openapi修改实例元数据后,小概率出现实例元数据不一致并保持。 #11934

Closed LondonUnderground closed 4 months ago

LondonUnderground commented 5 months ago

Describe the bug A clear and concise description of what the bug is.

Under the nacos cluster, service a has two instances. 001 and 002001 are registered on three nodes of the nacos cluster normally (temporary instances), The instance can customize and modify the metadata of the instance by calling the openapi of nacos (it stores the work order numbers of some businesses in certain states), After a period of operation, more and more data are stored and frequently modified. The metadata of 001 instance on the three nodes of Nacos is inconsistent. 在nacos集群下,服务a有两个实例,001和002,001在nacos集群三个节点上正常注册(临时实例), 实例通过调用nacos的openapi自定义修改实例的元数据(存放了一些业务使用某些状态的工单号), 经过一段时间运转,存放的数据越来越多,修改频繁,出现了nacos三个节点上的001实例元数据不一致问题。

Expected behavior A clear and concise description of what you expected to happen.

It is expected that the data of the service and the service itself will be consistent in the cluster environment. 预想是集群环境下服务和服务本身的数据是保持一致的。

Actually behavior A clear and concise description of what you actually to happen. 官方文档没有说明集群间实例的元数据是可以保持或者不保持一致性的

How to Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See errors

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

KomachiSion commented 5 months ago

API修改元数据是通过raft协议同步的, 如果出现不一致的情况, 可以看下是不是有问题的节点在raft上和其他节点中掉线了,alipay-jraft.log

KomachiSion commented 5 months ago

另外你这个使用场景可能不太正确,元数据一般是用来存放这个实例的属性标记,比如AZ,版本,标签等, 不应该存放动态的业务内容。 虽然这样用可以实现一些特殊的需求, 但是可能会造成变更过于频繁导致性能问题。

LondonUnderground commented 5 months ago

API修改元数据是通过raft协议同步的, 如果出现不一致的情况, 可以看下是不是有问题的节点在raft上和其他节点中掉线了,alipay-jraft.log 场景是之前决策的,不好改动,然后我贴一下alipay-jraft.log,有个节点是状态不对,其他两个节点好像解锁失败死锁了。

LondonUnderground commented 5 months ago

其中两台nacos是部署在同一台虚拟机上,剩余一台nacos部署在另外一台,区域网内。 114:8849 alipay-jraft.log 部分日志: 2024-04-01 14:57:53,942 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:54,443 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:54,945 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:55,447 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:55,948 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:56,450 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:56,951 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:57,453 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:57,955 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:58,456 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:58,958 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:59,459 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:57:59,961 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:00,462 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:00,964 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:01,465 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:01,966 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:02,468 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:02,970 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:03,471 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:03,973 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:04,474 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:04,976 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:05,477 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:05,979 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:06,481 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:06,982 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:07,484 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:07,986 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:08,487 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:08,988 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

LondonUnderground commented 5 months ago

114:8847:alipay-jraft.log 部分日志: 2024-04-01 14:57:07,680 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1999,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2038,5,main].

2024-04-01 14:57:07,928 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-1995,5,main].

2024-04-01 14:57:07,928 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2037,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main].

2024-04-01 14:57:07,928 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2031,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main].

2024-04-01 14:57:08,181 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2040,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2038,5,main].

2024-04-01 14:57:08,429 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2041,5,main].

2024-04-01 14:57:08,429 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2038,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2041,5,main].

2024-04-01 14:57:08,429 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2041,5,main].

2024-04-01 14:57:08,610 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2039,5,main].

2024-04-01 14:57:08,683 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2037,5,main].

2024-04-01 14:57:08,683 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2037,5,main].

2024-04-01 14:57:08,683 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2037,5,main].

2024-04-01 14:57:08,931 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2031,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main].

2024-04-01 14:57:08,931 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-2039,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by Thread[JRaft-Rpc-Closure-Executor-1995,5,main] and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,185 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2028,5,main].

2024-04-01 14:57:09,188 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2030,5,main].

2024-04-01 14:57:09,686 WARN Fail to unlock with Replicator [state=Probe, statInfo=, peerId=10.0.102.115:7847, type=Follower], the lock is held by null and current thread is Thread[JRaft-Rpc-Closure-Executor-2031,5,main].

LondonUnderground commented 5 months ago

115:8847 alipay-jraft.log 部分日志: 2024-04-01 14:58:00,964 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:01,465 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:01,966 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:02,468 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:02,970 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:03,471 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:03,973 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:04,474 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:04,976 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:05,477 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:05,979 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:06,481 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:06,982 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:07,484 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:07,986 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:08,487 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:08,988 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:09,490 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:09,991 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:10,493 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:10,994 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:11,496 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:11,998 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:12,499 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

2024-04-01 14:58:13,001 WARN Node <naming_instance_metadata/10.0.102.115:7847> is not in active state, currTerm=1.

LondonUnderground commented 5 months ago

这是nacos集群下某个nacos的alipay-jraft.log.2024-03-27.0 日志,同样时间点,其他两个nacos节点是正常的

2024-03-27 14:30:37,429 INFO Deleting snapshot /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8812.

2024-03-27 14:30:37,438 INFO Renaming /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/temp to /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8812.

2024-03-27 14:30:37,438 INFO Deleting snapshot /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8781.

2024-03-27 14:53:39,910 INFO Node <naming_persistent_service_v2/10.0.102.115:7847> term 1 start preVote.

2024-03-27 14:53:39,910 INFO onStopFollowing: LeaderChangeContext [leaderId=10.0.102.114:7847, term=1, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader 10.0.102.114:7847.]].

2024-03-27 14:53:40,261 WARN Channel in TRANSIENT_FAILURE state: 10.0.102.114:7849.

2024-03-27 14:53:40,261 WARN Channel in SHUTDOWN state: 10.0.102.114:7849.

2024-03-27 14:53:40,262 INFO Peer 10.0.102.114:7849 is connected.

2024-03-27 14:53:40,493 WARN Channel in TRANSIENT_FAILURE state: 10.0.102.114:7847.

2024-03-27 14:53:40,493 WARN Channel in SHUTDOWN state: 10.0.102.114:7847.

2024-03-27 14:53:40,493 INFO Peer 10.0.102.114:7847 is connected.

2024-03-27 14:53:40,503 INFO Node <naming_persistent_service_v2/10.0.102.115:7847> received PreVoteResponse from 10.0.102.114:7849, term=1, granted=false.

2024-03-27 14:53:40,506 INFO Node <naming_persistent_service_v2/10.0.102.115:7847> received PreVoteResponse from 10.0.102.114:7847, term=1, granted=false.

2024-03-27 14:53:40,533 WARN [GRPC] failed to send response.

io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at io.grpc.Status.asRuntimeException(Status.java:524) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:341) at com.alipay.sofa.jraft.rpc.impl.GrpcServer$1.sendResponse(GrpcServer.java:153) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:464) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.impl.core.NodeRequestProcessor.processRequest(NodeRequestProcessor.java:60) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:35) at com.alipay.sofa.jraft.rpc.impl.GrpcServer.lambda$null$1(GrpcServer.java:194) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.runTask(MpscSingleThreadExecutor.java:352) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.run(MpscSingleThreadExecutor.java:336) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.lambda$doStartWorker$3(MpscSingleThreadExecutor.java:263) at java.base/java.lang.Thread.run(Thread.java:842) 2024-03-27 14:53:40,533 WARN [GRPC] failed to send response.

io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at io.grpc.Status.asRuntimeException(Status.java:524) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:341) at com.alipay.sofa.jraft.rpc.impl.GrpcServer$1.sendResponse(GrpcServer.java:153) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:464) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.impl.core.NodeRequestProcessor.processRequest(NodeRequestProcessor.java:60) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:35) at com.alipay.sofa.jraft.rpc.impl.GrpcServer.lambda$null$1(GrpcServer.java:194) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.runTask(MpscSingleThreadExecutor.java:352) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.run(MpscSingleThreadExecutor.java:336) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.lambda$doStartWorker$3(MpscSingleThreadExecutor.java:263) at java.base/java.lang.Thread.run(Thread.java:842) 2024-03-27 14:53:40,534 WARN [GRPC] failed to send response.

io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at io.grpc.Status.asRuntimeException(Status.java:524) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:341) at com.alipay.sofa.jraft.rpc.impl.GrpcServer$1.sendResponse(GrpcServer.java:153) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:464) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.impl.core.NodeRequestProcessor.processRequest(NodeRequestProcessor.java:60) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:35) at com.alipay.sofa.jraft.rpc.impl.GrpcServer.lambda$null$1(GrpcServer.java:194) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.runTask(MpscSingleThreadExecutor.java:352) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.run(MpscSingleThreadExecutor.java:336) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.lambda$doStartWorker$3(MpscSingleThreadExecutor.java:263) at java.base/java.lang.Thread.run(Thread.java:842) 2024-03-27 14:53:40,534 WARN [GRPC] failed to send response.

io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at io.grpc.Status.asRuntimeException(Status.java:524) at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:341) at com.alipay.sofa.jraft.rpc.impl.GrpcServer$1.sendResponse(GrpcServer.java:153) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:464) at com.alipay.sofa.jraft.rpc.impl.core.AppendEntriesRequestProcessor.processRequest0(AppendEntriesRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.impl.core.NodeRequestProcessor.processRequest(NodeRequestProcessor.java:60) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:53) at com.alipay.sofa.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:35) at com.alipay.sofa.jraft.rpc.impl.GrpcServer.lambda$null$1(GrpcServer.java:194) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.runTask(MpscSingleThreadExecutor.java:352) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor$Worker.run(MpscSingleThreadExecutor.java:336) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.lambda$doStartWorker$3(MpscSingleThreadExecutor.java:263) at java.base/java.lang.Thread.run(Thread.java:842) 2024-03-27 15:00:37,620 INFO Deleting snapshot /home/nacos3/data/protocol/raft/naming_instance_metadata/snapshot/snapshot_8840.

LondonUnderground commented 5 months ago

看来之前的close ,我的jraft版本也是1.3.8排除了bolt的包,跟 #952h #1029 #10259问题类似。

KomachiSion commented 5 months ago

如果jraft的bug导致某个节点状态错误了,那应该只能升级一下版本试试。

LondonUnderground commented 4 months ago

是在nacos源码升级jraft的版本,还是直接升级nacos的版本?

KomachiSion commented 4 months ago

升级nacos版本, 单纯只升级jraft版本,可能会导致不兼容。

LondonUnderground commented 4 months ago

嗯嗯

KomachiSion commented 4 months ago

No more response from author, It seems new version has solved this problem.

LondonUnderground commented 4 months ago

还没 待验证