[Feature] handle TCP protocol

The aim of this feature is to make chproxy work with both http and tcp.

There are 2 aspects of this feature:

accepting TCP connections from clients
sending queries to clickhouse using TCP

Both tcp connections should be turn on/off with new variables in the conf file:

for the TCP connections to clickhouse, it should be in clusters/scheme. We can reuse the exisiting variable in clusters to store the needed information to create a tcp connections and create new ones if needed)
for the TCP connections with the client, it should be in server/tcp. server exists but tcpneeds to be created and will contain all the needed information + extra options to create a tcp connection

At the end of step 1-e), we should be able to use to command line tool clickhouse-client and connect to chproxy (in secure mode if we can) to do select queries: clickhouse-client --host <CHPROXY_IP> --secure --port <CHPROXY_PORT> --user <USERNAME> --password <PASSWORD> At the end of step 3), we should be able to do insert queries.

This is a big task that should be done in multiple steps: 0) understand how the clickhouse TCP protocol works We can't just use an existing go TCP client for clickhouse (like ch-go) because we need to be able to mimick the behaviour of a clickhouse server so that the chroxy clients believe they're talking with clickhouse. Therefore, we need to understand how the protocol works and implement it.

overview of the protocol

This link gives the workflows when -establishing a connection, -sending a read query, -sending a write query https://github.com/housepower/ClickHouse-Native-JDBC/blob/21cbb5ebab0a5cab54174e049c268ab8bc6da032/docs/deep-dive/native_protocol.md

In a few words, a client can send 6 types of messages: -Hello = 0, used to establish a connections and check the version of the protocol of the client and the server https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/TCPHandler.cpp#L1123 -Query = 1, used to send a query (maybe contains Query id & query settings) https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/TCPHandler.cpp#L1386 -Data = 2, used to send data to clickhouse (mainly for the insert queries) -Cancel = 3, used to cancel a query -Ping = 4, used to check if the connection to the server is alive. -KeepAlive = 6, used to keep the connection alive (not sure if we need it in chproxy)

FYI, here are the other messages the client can send, but they're mainly used internally by clickhouse to send messages between shards (full list available here https://github.com/ClickHouse/ClickHouse/blob/master/src/Core/Protocol.h#L134): -TablesStatusRequest = 5, -Scalar = 7, -IgnoredPartUUIDs = 8, -ReadTaskResponse = 9, -MergeTreeReadTaskResponse = 10

The server can send back X types of messages: -Hello = 0, the response to the hello query -Data = 1, used to send data (for example the result of a query) -Exception = 2, sent if something happened during the request -Progress = 3, query execution progress: rows read, bytes read [we will not implement it in this PR because it might require a huge refactoring] -Pong = 4, Ping response -EndOfStream = 5, sent at the end of the response -ProfileInfo = 6, a Packet with profiling info (nb: not sure if it's mandatory first the first version, if not we will not implement it at first) -Totals = 7, an option that can be asked by the client (with the SQL clause WITH TOTAL): nb we will not implement it in this PR -Extremes = 8, an option that can be asked by the client (with the settings extremes): nb we will not implement it in this PR -TablesStatusResponse = 9, an option that can be asked by the client: nb we will not implement it in this PR -Log = 10, use to show the logs of the query execution: nb we will not implement it in this PR

FYI, Here are the other messages the client can send but they're mainly used internally by clickhouse to send messages between shards (full list available here https://github.com/ClickHouse/ClickHouse/blob/master/src/Core/Protocol.h#L67) -TableColumns = 11, -PartUUIDs = 12, -ReadTaskRequest = 13, -ProfileEvents = 14, -MergeTreeReadTaskRequest = 15,

dealing with new versions of the protocol

The protocol changes every 2-6 months on average. Most changes are only for the inner-logic between shards: CF https://github.com/ClickHouse/ClickHouse/blame/master/src/Core/ProtocolDefines.h The protocol is backward compatible so if we do it right by mimicking the last protocol version. But, in order to add the new clickhouse features that rely on the protocol, we might need to update CHProxy. Nb: We will need to set the client_tcp_protocol_version we will use for both the client and clickhouse, we will take the latest one from clickhouse when we start the developments (definied in ProtocolDefines.h).

Here are some useful links:
- The official documentation about binary protocol and especially about the protocol https://clickhouse.com/docs/en/native-protocol/client https://clickhouse.com/docs/en/native-protocol/server
- The source code of a low level ch-go TCP client (used by the official clickhouse go client) https://github.com/ClickHouse/ch-go/ https://github.com/ClickHouse/ch-go/blob/main/client.go https://github.com/ClickHouse/ch-go/blob/9ecf98b1237e2c1fe4cb903eeebc59e38f948cb8/proto/server_code.go https://github.com/ClickHouse/ch-go/blob/9ecf98b1237e2c1fe4cb903eeebc59e38f948cb8/proto/client_code.go
- The source code from clickhouse linked to the management for TCP clients: https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/TCPHandler.h https://github.com/ClickHouse/ClickHouse/blob/master/src/Server/TCPHandler.cpp
  - The source code of an easy to read tcp client in java https://github.com/housepower/ClickHouse-Native-JDBC/blob/21cbb5ebab0a5cab54174e049c268ab8bc6da032/clickhouse-native-jdbc/src/main/java/com/github/housepower/protocol/Response.java#L55 https://github.com/housepower/ClickHouse-Native-JDBC/blob/21cbb5ebab0a5cab54174e049c268ab8bc6da032/clickhouse-native-jdbc/src/main/java/com/github/housepower/protocol/Request.java#L33

1) the first big milestone is to be able to handle query only queries (i.e. select). This tasks can be devided as follow:

1-a) maintain the HTTP interface for clients and communicate with clickhouse using TCP Here is the Workflow: if chproxy is configured with TCP, everytime we get an http query, we create a tcp connection to clickhouse, send the query in binary,get an answer then sent it back to the http caller nb: (no caching or connexion pooling at this step)
- we can look at how ch-go handled the connections & query with clickhouse with TCP: https://github.com/ClickHouse/ch-go/blob/main/client.go IMHO, we shouldn't reuse ch-go but implement our own client since we will have to create a "server" in another step so it's best to implement both the client & server logic.
- we need to create a mapping between tcp errors & http errors
1-b) add caching abilities
- need to sniff everything that happens in the TCP pipe to: -cache the query responses (in case of success) -reuse cached queries
1-c) add settings stored in the http query params to the TCP connexion
1-d) add a pool of TCP connections add a pool of TCP connections (for each clickhouse shards & for the clients) to avoid creating connections every time Warning: we should be careful if a previous query modified a setting in a tcp connection to clickhouse, for example putting the max_execution_time to 5 sec. In this case, we should either drop the connection or have a way to reset all the settings.
1-e) [read query only] provide a TCP interface for clients and communicate with clikhouse using TCP
- need to sniff everything that happens in the TCP pipe to: -get user information (in order to apply quotas or cache related to users) -intercept incoming queries (to check if they're already cached)
- need to micmic clickhouse TCP behavior to behave as a server for the client & behave
- need to handle (native) compression: if the stream is compressed, can we still sniff it?
- need to handle current chproxy features like user managements (being able to switch user/password with the one written in the chproxy configuration)
1-f) ability to cancel a query when the client stops its http/tcp connection or ask to cancel the query
1-g) handle the ping query whether a client in tcp or http triggers ping, we need to be able to send a ping to clickhouse in tcp
1-h) [maybe optional] add a TLS layer:
- add TLS encryption from CHProxy to CH (so that the proxy will work if clickhouse uses TCP secure port)
- add TLS encryption from end users to CHProxy (so that end users can use TLS to secure the connection if they want) ==> [open question] do we need to handle TLS of TCP at the beginning?
1-i) [optional] implement some of the missing features of the TCP protocol like the Progress msgs, the totals msg, the extremes, ...

2) do benchmarks on TCP vs HTTP for select queries (and put the results in the doc)

3) make the TCP protocol work for write queries

4) do benchmarks on TCP vs HTTP for insert queries (and put the results in the doc)

So after looking into this, here are some of my notes and suggestions (documenting here what was discussed offline):

As discussed we will start utilising the low level ch-go library to help with the protocol implementation. This will speed up an initial implementation and we can always move away if we run into limitations (or upstream a change to ch-go).

A couple of points to note:

The protocol is quite cumbersome to work with, due to messages on both sides needing exactly the same bytes. That means we need to be careful with which versions we support (if you look at ch-go & clickhouse-go there is a lot of checks on the specific revision). Using ch-go helps, but it is still easy to miss a field and it is not always clear what is required or what isn't.
There are a lot of different types and cases to handle
Query responses are a list of blocks. Several block types are returned and some are used for metadata tracking, so we have to be careful there. Additionally we might need to manually reconstruct the response as each block contains a number of rows and columns, but I am not sure if they always contain the same number of columns or that can differ. HTTP -> TCP and TCP -> HTTP might be difficult as the responses are quite different. Not impossible but it will require a lot of effort.

One alternative approach is to go the full Proxy route (TCP -> TCP). That means we need very minimal protocol understanding. Only hello for authentication and understanding the query request, while for the query response we just need to figure out how much blocks we need to read (which requires decoding but not reconstructing a full response). That allows us to focus more on how to properly handle the TCP connections first before spending a lot of time on HTTP -> TCP translation.

On the TCP -> TCP protocol, that means we need to change the proposed configuration. We need to validate that only the TCP server connects with the TCP URL of Clickhouse.

One potential solution is to turn clusters from a list of addresses to structs with: Host, HTTP port and TCP port. That would allow each server to chose what is available (and we can validate whether the TCP port is set if a TCP server is configured or just throw an error/use the default port if it isn't set).

However this would be a backwards incompatible change. Given how big of a change the TCP feature will introduce I don't think it is bad to introduce a backwards incompatible change.

For the TCP -> TCP diagram we will have to handle the following flows, starting with authentication:

sequenceDiagram
    client->>chproxy: ClientHello
    chproxy->>chproxy: Determine User/Password from configuration
    chproxy->>clickhouse: ClientHello 
    clickhouse->>chproxy: ServerHello
    chproxy->>chproxy:  Determine protocol revision and other relevant metadata
    chproxy->>client: ServerHello, forward right protocol revision and metadata, with chproxy server name

Next we can work on the Query (starting with SELECTs). Note for Inserts we can have data blocks coming from the client.

sequenceDiagram
    client->>chproxy: Query
    client->>chproxy: Empty Data Block
    chproxy->>chproxy: Check Query user (based on TCP client session) and determine settings
    chproxy->>clickhouse: Query (kill based on timeout)
    clickhouse->>chproxy: Data Blocks (with Query response and Query Metadata)
    clickhouse->>chproxy: Data End of Stream
    chproxy->>chproxy: Cache Data Blocks
    chproxy->>client: Data Blocks

Note that these are both happy flow diagrams. I didn't include e.g. killed queries.

Additionally there is a question about how to respond to clients. Do we stream the response already back to the client and cache asynchronously (cleaning the cache if we encounter a ClickHouse exception). Or do we wait for all the data to be available before we respond to the client.

IMO first approach would be preferred. I don't think we should wait for the full response to start sending data back, that will make it easier to avoid overloading chproxy memory as well.

I think we would also benefit from some good abstractions over the Data Stream in the TCP protocol. E.g. an iterator pattern to deal with the different blocks/types in the protocol.

Note that Queries might not work with such a simple interface, as we could recieve multiple metadata responses during a query (e.g. profile info, logs, query progress). See for example clikhouse-go processing of data blocks

package protocol

type ProtocolIterator interface {
    HasNext() bool
    GetNextTyped(expected ProtocolCode) interface{}
}

Maybe a state machine would be better suited for this type of problem?

Also as this will be quite a lot of effort (and especially if we make a backwards compatible change to the configuration), should we consider creating a seperate branch so we can still make fixes/contribute to master while we work on TCP support?


IMO first approach would be preferred. I don't think we should wait for the full response to start sending data back, that will make it easier to avoid overloading chproxy memory as well.

We should use the same logic as the one we're using for the http protocole:

if cache is disabled then we stream the answer to the client
if cache is enabled then we stream the answer to a file/redis then we send the full answer to the client

Also as this will be quite a lot of effort (and especially if we make a backwards compatible change to the configuration), should we consider creating a seperate branch so we can still make fixes/contribute to master while we work on TCP support?

One tradeoff would be a small refacto on the current codebase so that we can add the tcp logic in specific files and iterate without the risk the break something on TCP. Conceptually, what we're doing with TCP is the same as at we're doing with HTTP . It's just that in TCP case, we need to handle the implementation of the protocol whereas with HTTP it's hidden by the httputil.ReverseProxy interface, so if we can implement an tcputil.ReverseProxy interface it might be enough (more or less, of course we will face some limitations regarding some features but it can be a starting point)

FYI I did a simple prototype to help on the design, feel free to play with it and modify it: https://github.com/mga-chka/tcpproxy

ContentSquare / chproxy

[Feature] handle TCP protocol #287