ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.25k stars 6.86k forks source link

Replicated database supports some replicas as observer role to get metadata and apply log #50824

Open LiuYangkuan opened 1 year ago

LiuYangkuan commented 1 year ago

Use case The sql to create Replicated database is:

CREATE DATABASE test_db ENGINE = Replicated('zoo_path', '{shard_name}', '{replica_name}') [SETTINGS ...]

In cluster mode, we should use on cluster default to make all server to create the database test_db. But when the cluster has new server joined, the server haven't the test_db database. In some case, some server quit from the cluster, then all dll in the Replicated database will be unfinished.

Describe the solution you'd like

I suggest we should provide a server setting default_replicated_database_path='/clickhouse/databases', it's a zk path and the Replicated engine use default argument Replicated('default_replicated_database_path/{db_name}', '{shard_name}', '{replica_name}').

If a server is run as observer by server setting observe_replicated_database=true, then it will create all databases under path /clickhouse/databases and only get metadata and apply log in every replicated database, just like a learner in raft.

Additional context After implement this feature, the cluster of compute group will be more elastic.

evillique commented 10 months ago

It seems like what you are looking for was recently implemented in https://github.com/ClickHouse/ClickHouse/pull/55641 as Replica Groups: docs