apache / iotdb

Apache IoTDB
https://iotdb.apache.org/
Apache License 2.0
5.61k stars 1.02k forks source link

About refine the exception handling logic in method commitTo in RaftLogManager #3856

Open jixuan1989 opened 3 years ago

jixuan1989 commented 3 years ago

Discussed in https://github.com/apache/iotdb/discussions/3833

Originally posted by **chengjianyun** August 25, 2021 Hi, all The discussion started from the issue https://github.com/apache/iotdb/discussions/3784 and https://issues.apache.org/jira/browse/IOTDB-1583. Here we don't talk about how to fix the concrete `BufferOverFlowException`(a pr has been proposed, see https://github.com/apache/iotdb/pull/3832) but how such exception should be handled. We have had some discission offline and let me summary here to let more people join the discussion. ## Background According to the issue we mentioned above, right now, if some exception is thrown between the code in `RaftLogManager.java`, the raft group would run into a invalid state that can't accept write request anymore. Root cause is that `committedEntries`, `uncommittedEntries` and `commitIndex` are not updated together if some exception happens in the red rectangle which cause the inconsistency between the variables. ![1629887128(1)](https://user-images.githubusercontent.com/6150814/130774200-d709a962-8262-48b4-8c82-68154a103c39.jpg) ## Discussion I think here we have two different opinions: 1. Tolerant the error in some way and let the data group continue serve 2. Let the node stop serve the data group ### 1. Tolerant the error in some way Here we may set some NO_OP like action for error entries which could let raft log running forward and keep monotonically increase. And at same time tell the user that these operations are fail. The drawback is obvious, as the entry has been append to all other nodes in the data group, we can't guarantee all nodes in the group would suffer such runtime exception. So, we can't guarantee the consistency of the nodes in the data group. Of course, for the exceptions like [issue-1583](https://issues.apache.org/jira/browse/IOTDB-1583), all nodes should suffer the same exception and it should finally keep the consistent. For me, I won' t vote for the solution right now as it can't make sure the consistency. We can talk more about the solution here. ### 2. Let node stop serve the data group The idea here is to let the node is suffering the exception quit the data group properly so that other group members may continue work. The solution could tolerant the exception just happen on some of nodes and keep the server's availability. But for the [issue-1583](https://issues.apache.org/jira/browse/IOTDB-1583), finally all nodes will quit the group cause the service down. This is a conservative policy, which is much safe to apply. And in etcd, if the storage have error, it will stop serve write operation as show in comments in file [storage.go](https://github.com/etcd-io/etcd/blob/27fc7e2296f506182f58ce846e48f36b34fe6842/raft/storage.go#L40-L45). ``` // Storage is an interface that may be implemented by the application // to retrieve log entries from storage. // // If any Storage method returns an error, the raft instance will // become inoperable and refuse to participate in elections; the // application is responsible for cleanup and recovery in this case. type Storage interface { ... ``` Here, I like the solution right now because it won't cause very big change and doesn't cause side effect. Your ideas and comments are welcome. Thanks!
neuyilan commented 3 years ago

Thanks for point out the issue, I agree with solution 2, but I think the issue happens because we do not have one rollback mechanism, if some error happens during the commitTo, we rollback all the previous operations to make the committedEntriesunCommittedEntriesstableEntries consist with each other, the issue would be resoleved?

chengjianyun commented 3 years ago

Thanks for point out the issue, I agree with solution 2, but I think the issue happens because we do not have one rollback mechanism, if some error happens during the commitTo, we rollback all the previous operations to make the committedEntriesunCommittedEntriesstableEntries consist with each other, the issue would be resoleved?

Yes, if we can rollback the operation, then the issue is resolved. In etcd, Raft only takes charge of maintain the consistency between raft logs, the apply or log persistence logic is left for Raft client. The raft client need to consider how to handle these state machine errors when apply or persist log.