Open zhuxt2015 opened 2 years ago
No response
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
#troubleshooting
Great feature, this feature has been discussed before, it's needed to give a detailed design, such like how we store the log, how to solve the split-brain..., this is a good begin.
@ruanwenjun ok, I'll give detail design later
@zhuxt2015 Greate feature,could you please introduce more detail about the leader master and other masters.
Hi @zhuxt2015 I am also interested in implementing this issue. Maybe I can join the discussion of design and implementation
Hi @zhuxt2015 I am also interested in implementing this issue. Maybe I can join the discussion of design and implementation
@leo-lao Great! Thank you for joining, I'm completed most of mostly functions, I will submit a PR this week, then let's discuss the subsequent division of development work.
Hi @zhuxt2015 I am also interested in implementing this issue. Maybe I can join the discussion of design and implementation
@leo-lao Great! Thank you for joining, I'm completed most of mostly functions, I will submit a PR this week, then let's discuss the subsequent division of development work.
Before submit PR, it's better to provide a detail design. The current design is not enough, we need to consider how to persistent data in disk, and how to implement the lock, how to maintain the data. Do we need to use some lib or we will implement the raft by ourselves.
I will use sofa-jraft lib, here is Github Repository and User Guide
SOFAJRAFT is a production-grade java implementation of RAFT consensus algorithm. SOFAJRaft is licensed under the Apache License 2.0. SOFAJRaft relies on some third-party components, and their open source protocol is also Apache License 2.0.
The core component is StateMachine and RheaKV .
StateMachine is an implementation of users’ core logic. It calls the onApply(Iterator) method to apply log entries that are submitted with Node#apply(task) to the business state machine.
RheaKV is a lightweight, distributed, and embedded KV storage library, which is included in the JRaft project as a submodule.
All node information is stored in StateMachine's memory, StateMachine manages the registration and downtime of nodes, When a new node joins the cluster, a heartbeat packet is sent to the leader master, The last update time of the node is recorded and synchronized to all masters。When there is a node down, Ephemeral Node Refresh Thread scan records in StateMachine , When the last update time differs from the current time by more than a certain amount of time, nodes are removed and the removed results are synchronized to other masters.
The design of Subscribe/Notify is the same with ephemeral node, when leader master' StateMachine senses a data change in the server , then it will trigger the subscribed listener.
The design of global lock is the same with ephemeral node, there will be a KVStore in the StateMachine to store the lock info. RheaKVStore will store the lock of master server and clear the expiry lock.
@zhuxt2015 Please follow the dsip https://dolphinscheduler.apache.org/en-us/community/DSIP.html process to create DSIP, thanks
@zhuxt2015 I searched available raft implementations, and found apache ratis may be one better choice? SOFAJRAFT requires more dependencies, like
<dependency>
<groupId>com.alipay.sofa</groupId>
<artifactId>bolt</artifactId>
<version>${bolt.version}</version>
</dependency>
<dependency>
<groupId>com.alipay.sofa</groupId>
<artifactId>hessian</artifactId>
<version>${hessian.version}</version>
</dependency>
this is not so acceptable for one Apache Project.
On the other hand, I found apache ratis used by apache ozone and alluxio, which add more credit
@leo-lao com.alipay.sofa.bolt
and com.alipay.sofa.hessian
use Apache License 2.0, I think they can be used. I also noticed ratis, it does not implement distributed lock, so not use it.
Is it is a must to implement distributed lock
in dolphinscheduler?
As far as I know, distributed lock
in current version, is used for making sure only one master handling requests or failover event.
Is it ok If we just put above logics in Leader Master?
Is it is a must to implement
distributed lock
in dolphinscheduler? As far as I know,distributed lock
in current version, is used for making sure only one master handling requests or failover event. Is it ok If we just put above logics in Leader Master?
This is not a good idea, you need to import the leader
role in other registry plugin.
Is it is a must to implement
distributed lock
in dolphinscheduler? As far as I know,distributed lock
in current version, is used for making sure only one master handling requests or failover event. Is it ok If we just put above logics in Leader Master?This is not a good idea, you need to import the
leader
role in other registry plugin.
In fact not, with raft introduced, we will introduce leader and followers in this system, no need to rely on other registry plugins.
method | details | pros | cons | |
raft with distributed lock | multiple master handling tasks | no pressure on single master | extra dependency/implementation of distributed lock | |
raft without distributed lock | only the raft leader handling tasks | simple, popular way(like Apache Ozone, Alibaba RemoteShuffuleService) | pressure on single master |
I feel both are OK
Is it is a must to implement
distributed lock
in dolphinscheduler? As far as I know,distributed lock
in current version, is used for making sure only one master handling requests or failover event. Is it ok If we just put above logics in Leader Master?This is not a good idea, you need to import the
leader
role in other registry plugin.In fact not, with raft introduced, we will introduce leader and followers in this system, no need to rely on other registry plugins.
method details pros cons
raft with distributed lock multiple master handling tasks no pressure on single master extra dependency/implementation of distributed lock raft without distributed lock only the raft leader handling tasks simple, popular way(like Apache Ozone, Alibaba RemoteShuffuleService) pressure on single master I feel both are OK
We need to reach a consensus that we import raft just as a new registry plugin, this will not affect our existing plugin.
@leo-lao For the time being, backward compatibility should be guaranteed, The user can choose to use raft registry or zookeeper registry.
Gotcha, then your idea works
Search before asking
Description
The role of zookeeper
Problems caused by zookeeper
Advantages of remove zookeeper
Blue print of remove zookeeper
Use case
No response
Related issues
6680
Are you willing to submit a PR?
Code of Conduct