apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.71k stars 3.28k forks source link

[Enhancement] use RaftKeeper instead of BDBJE to manage metadata log #17941

Open Henry2SS opened 1 year ago

Henry2SS commented 1 year ago

Search before asking

Description

The content of this issue is also placed in the discussion area.

Background

Background Now doris uses BDBJE to process metadata log. In the scenario of high concurrent writing, there are performance problems.

The write latency of BDBJE is between 1 ms and tens of ms, and only supports single concurrent write. The theoretical maximum TPS is around 1000. Therefore, there are serious performance problems in high-concurrency writing scenarios.

Pressure test

We tested doris' BDBJE with only one FE node on creating databases. In the locked state, the write TPS is about 800+; in the case of removing the lock, the write TPS is about 1500+.

Industry Mainstream

ZK writes TPS 30000+. RaftKeeper writes TPS at 70000+. RaftKeeper Benchmark

The scheme to use raft

Current situation of Doris FE:

image

The general idea of Raft management Log:

  1. Implement the RaftCore module with C++, realize Doris FE log storage, and replace the original BDBJE.
  2. The Java layer implements RaftJournal to implement the Journal interface.

RaftCore Design:

  1. Interface module: C++ RaftCore, adding, deleting, checking, member changing, and initialization and closing interface. Just provide a JNI interface. The write and read JNI interfaces support concurrent calls.
  2. log module, state machine module, leader election

RaftJournal Design:

  1. When FE initializes RaftJournal, it calls the JNI interface to initialize the underlying C++ implementation The RaftCore.
  2. RaftJournal internally uses the JNI interface to store and read metadata.
  3. When closing FE, call the JNI interface to close the RaftCore implemented by the underlying C++.
  4. Members change the JNI interface.

Solution

use RaftKeeper instead of BDBJE

Are you willing to submit PR?

Code of Conduct

dataroaring commented 1 year ago

Are there any data related to reliability of RaftKeeper? Is braft is a good candidate?

nicelulu commented 1 year ago

@dataroaring Thank you for your reply. RaftKeeper has used 60+ sets online in our company. It has been running stably online for more than a year, and it was only recently open-sourced to github.

The braft system is huge and complex. RaftKeeper uses nuraft, the nuraft library is smaller and more concise, with fewer dependencies. If it is based on RaftKeeper transformation, it will be much more convenient, including performance optimization and other tasks do not need to be repeated.

JackyWoo commented 1 year ago

@dataroaring you mean consistency test such as Jepsen?