Altinity / clickhouse-sink-connector

Replicate data from MySQL, Postgres and MongoDB to ClickHouse®
https://www.altinity.com
Apache License 2.0
234 stars 54 forks source link

Distributed and Cluster Support #437

Open AjinkyaTaranekar opened 10 months ago

AjinkyaTaranekar commented 10 months ago

Hi Team,

Do you know if we have explored the possibility of adding the Distributed Engine of Clickhouse under consideration? Can we have a flag saying cluster.mode.enabled in config, to run DDL and insertion queries on Clickhouse Cluster?

aadant commented 10 months ago

@AjinkyaTaranekar, it should be possible to use RRMT (ReplicatedReplacingMergeTree) as default engine.

@subkanthi is it something you tested ? What would be the setting to get it working ?

AjinkyaTaranekar commented 10 months ago

@aadant I added distributed support by creating an inner table with RMT and a Distributed Engine Table on top of it. DDL and DML are functional. Ref: https://github.com/AjinkyaTaranekar/clickhouse-sink-connector/tree/feature/DistributedEngine

aadant commented 10 months ago

@AjinkyaTaranekar writing to a distributed table is slower than a regular MergeTree but this enables to shard the data. I think a CH node is typically quicker than a MySQL node for inserts with RMT. What is the use case compared to Replication that would provide HA.

AjinkyaTaranekar commented 10 months ago

@aadant My goal is to replicate data from MySQL to a Distributed Clickhouse Cluster that has two nodes for sharding. We have set up a system with three nodes, consisting of two shards and one keeper.

aadant commented 10 months ago

@subkanthi what do you think ? Please note that the altinity_sink_connector database will only be in one node. Otherwise it should be working. we may want to support it in the python loader (It does an initial load for very large databases, both schema and data.)

AjinkyaTaranekar commented 10 months ago

Yes @aadant , the above branch I use can replicate data across the cluster, with altinity_sink_connector in a single node. Right now I'm reading how this could be done to the python loader as well.