Open filimonov opened 3 weeks ago
@alexey-milovidov @ArctypeZach : we think that would be a good feature to have in CH. Especially good to build "Parts Writers" that can live outside the CH servers. There are lot of workloads that are write heavy and also non-uniform with a couple of big spikes a day, typically CPU bound. So you can have a decent cluster that is "helped" by Parts writers that have higher latency but bigger throughput. Do you like this idea ?
Introduction
This RFC proposes a feature that enables pushing precreated parts (e.g., from
clickhouse-local
or another ClickHouse server instance) directly to a ClickHouse server table using the client protocol.This will facilitate the deployment of multiple worker nodes capable of handling complex preprocessing tasks, including initial parsing, cleaning, and even performing initial merges. These nodes can then push the precreated parts to the ClickHouse server.
By offloading these tasks to external nodes, the ClickHouse server will not need to handle activities such as parsing data formats, sorting data, filling defaults, creating indexes and marks, compressing files, or performing initial merges. This significantly reduces the workload on the ClickHouse server.
Alternatives
Currently, this functionality can be achieved by writing parts directly to the filesystem of the ClickHouse server (into the
detached
subfolder) and then manually attaching them. However, this approach is cumbersome, requires additional data transfer channels, and poses security risks (as the ingestors need direct access to the ClickHouse filesystem). It is also impractical or impossible in certain environments (e.g., Kubernetes).A similar outcome can be achieved using
ALTER TABLE FETCH PART FROM '/zookeeper/path'
, but this method uses a pull model that requires the client to open an additional port (9009) and register itself with ZooKeeper.Proposal
We propose extending the ClickHouse client-server protocol to support the transfer of precreated parts from the client directly to a ClickHouse server. This will introduce several new commands and mechanisms to ensure safe and efficient data transfers.
Key Components
New GRANT for Part Transfer
ALL
permissions by default and must be explicitly granted by administrators.Protocol Extension for Part Transfer
New Command:
ALTER TABLE ATTACH INLINE PART
New Command:
ALTER TABLE PUSH PART TO remote('...')
ALTER TABLE ATTACH INLINE PART
command to the remote server.Example Workflow
clickhouse-local
and populates the schema of a table that fully matches the schema of the table on the remote ClickHouse server.OPTIMIZE FINAL
(if needed), and finally issues the commandALTER TABLE PUSH PART TO remote('remote-server-address')
.Other Considerations
insert_quorum
, parts exchange, etc.