ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.48k stars 6.89k forks source link

RFC: Precreated parts transfer via native protocol #70595

Open filimonov opened 3 weeks ago

filimonov commented 3 weeks ago

Introduction

This RFC proposes a feature that enables pushing precreated parts (e.g., from clickhouse-local or another ClickHouse server instance) directly to a ClickHouse server table using the client protocol.

This will facilitate the deployment of multiple worker nodes capable of handling complex preprocessing tasks, including initial parsing, cleaning, and even performing initial merges. These nodes can then push the precreated parts to the ClickHouse server.

By offloading these tasks to external nodes, the ClickHouse server will not need to handle activities such as parsing data formats, sorting data, filling defaults, creating indexes and marks, compressing files, or performing initial merges. This significantly reduces the workload on the ClickHouse server.

Alternatives

Currently, this functionality can be achieved by writing parts directly to the filesystem of the ClickHouse server (into the detached subfolder) and then manually attaching them. However, this approach is cumbersome, requires additional data transfer channels, and poses security risks (as the ingestors need direct access to the ClickHouse filesystem). It is also impractical or impossible in certain environments (e.g., Kubernetes).

A similar outcome can be achieved using ALTER TABLE FETCH PART FROM '/zookeeper/path', but this method uses a pull model that requires the client to open an additional port (9009) and register itself with ZooKeeper.

Proposal

We propose extending the ClickHouse client-server protocol to support the transfer of precreated parts from the client directly to a ClickHouse server. This will introduce several new commands and mechanisms to ensure safe and efficient data transfers.

Key Components

  1. New GRANT for Part Transfer

    • A new permission will be introduced to control the ability to push precreated parts. This is necessary to ensure security, as pushing parts could be unsafe if not properly controlled.
    • This GRANT should not be included in the ALL permissions by default and must be explicitly granted by administrators.
  2. Protocol Extension for Part Transfer

    • A new extension to the client-server protocol will be developed to handle part transfers. This extension must support sending complex parts (containing many files) in a safe and reliable way.
    • The protocol should include integrity mechanisms, such as checksums, to ensure data consistency during transfer.
    • Where possible, existing code for part exchange used in replication can be reused to handle part transfers efficiently and safely.
  3. New Command: ALTER TABLE ATTACH INLINE PART

    • This command will allow parts sent from the client to be attached directly on the server side.
    • Upon receiving the part, the server will store it in the detached folder of the target table.
    • The command will be executed as part of the query sent by the client, meaning the part data should be included within the query itself.
  4. New Command: ALTER TABLE PUSH PART TO remote('...')

    • This command enables pushing a precreated part from the local ClickHouse server to a remote ClickHouse instance.
    • The command will:
      1. Establish a connection to the remote ClickHouse server.
      2. Send the ALTER TABLE ATTACH INLINE PART command to the remote server.
      3. Transfer the part inline if no exceptions occur.
    • This command would operate over the native protocol (port 9000), ensuring compatibility with the existing client-server communication model. While using the replication protocol (port 9009) might simplify certain aspects, it would limit the feature to replicated tables and expose additional network ports, which is undesirable from a security perspective.

Example Workflow

Other Considerations

aadant commented 8 hours ago

@alexey-milovidov @ArctypeZach : we think that would be a good feature to have in CH. Especially good to build "Parts Writers" that can live outside the CH servers. There are lot of workloads that are write heavy and also non-uniform with a couple of big spikes a day, typically CPU bound. So you can have a decent cluster that is "helped" by Parts writers that have higher latency but bigger throughput. Do you like this idea ?