Altinity / clickhouse-sink-connector

Replicate data from MySQL, Postgres and MongoDB to ClickHouse®
https://www.altinity.com
Apache License 2.0
224 stars 53 forks source link

Distributed and Cluster Support #437

Open AjinkyaTaranekar opened 9 months ago

AjinkyaTaranekar commented 9 months ago

Hi Team,

Do you know if we have explored the possibility of adding the Distributed Engine of Clickhouse under consideration? Can we have a flag saying cluster.mode.enabled in config, to run DDL and insertion queries on Clickhouse Cluster?

aadant commented 9 months ago

@AjinkyaTaranekar, it should be possible to use RRMT (ReplicatedReplacingMergeTree) as default engine.

@subkanthi is it something you tested ? What would be the setting to get it working ?

AjinkyaTaranekar commented 9 months ago

@aadant I added distributed support by creating an inner table with RMT and a Distributed Engine Table on top of it. DDL and DML are functional. Ref: https://github.com/AjinkyaTaranekar/clickhouse-sink-connector/tree/feature/DistributedEngine

aadant commented 9 months ago

@AjinkyaTaranekar writing to a distributed table is slower than a regular MergeTree but this enables to shard the data. I think a CH node is typically quicker than a MySQL node for inserts with RMT. What is the use case compared to Replication that would provide HA.

AjinkyaTaranekar commented 9 months ago

@aadant My goal is to replicate data from MySQL to a Distributed Clickhouse Cluster that has two nodes for sharding. We have set up a system with three nodes, consisting of two shards and one keeper.

aadant commented 9 months ago

@subkanthi what do you think ? Please note that the altinity_sink_connector database will only be in one node. Otherwise it should be working. we may want to support it in the python loader (It does an initial load for very large databases, both schema and data.)

AjinkyaTaranekar commented 9 months ago

Yes @aadant , the above branch I use can replicate data across the cluster, with altinity_sink_connector in a single node. Right now I'm reading how this could be done to the python loader as well.