ClickHouse / clickhouse-java

Java client and JDBC driver for ClickHouse
https://clickhouse.com
Apache License 2.0
1.39k stars 513 forks source link

[QA] Java: How to use the Native TCP protocol instead of the HTTP protocol? Is it GA-ed yet? #1644

Open leiless opened 1 month ago

leiless commented 1 month ago

Hi, @zhicwu.

Native TCP, which has less overhead.

https://clickhouse.com/docs/en/interfaces/overview

The native protocol is used in the command-line client, for inter-server communication during distributed query processing, and also in other C++ programs. Unfortunately, native ClickHouse protocol does not have formal specification yet, but it can be reverse-engineered from ClickHouse source code (starting around here) and/or by intercepting and analyzing TCP traffic.

https://clickhouse.com/docs/en/interfaces/tcp

TCP/Native: Supported: ❌

https://github.com/ClickHouse/clickhouse-java

Is Java TCP client supported and GA-ed yet?

I wonder is there any way to use the native TCP protocol to connect to ClickHouse server in Java?

I found https://mvnrepository.com/artifact/com.clickhouse/clickhouse-tcp-client clickhouse-tcp-client 0.3.2-patch9, but it seems just a symlink to the clickhouse-client library (correctify me if I'm wrong)?

leiless commented 1 month ago

FYI, I can successfully connect to the ClickHouse server by HTTP protocol (port 8123). But If I switch to using the TCP protocol, the java.net.ConnectException: No client available will be thrown.

Sample code

package com.mycompany.app;

import com.clickhouse.jdbc.ClickHouseDataSource;

import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class App {
    static void usedPooledConnection(String url) throws SQLException {
        // connection pooling won't help much in terms of performance,
        // because the underlying implementation has its own pool.
        // for example: HttpURLConnection has a pool for sockets
        HikariConfig poolConfig = new HikariConfig();
        poolConfig.setConnectionTimeout(5000L);
        poolConfig.setMaximumPoolSize(20);
        poolConfig.setMaxLifetime(300_000L);
        poolConfig.setUsername("REDACTED");
        poolConfig.setPassword("REDACTED");
        poolConfig.setDataSource(new ClickHouseDataSource(url));

        try (HikariDataSource ds = new HikariDataSource(poolConfig); Connection conn = ds.getConnection(); Statement s = conn.createStatement(); ResultSet rs = s.executeQuery("SELECT 123")) {
            System.out.println(rs.next());
            System.out.println(rs.getInt(1));
        }
    }

    public static void main(String[] args) throws SQLException {
        // HTTP
        //String url = "jdbc:ch:http://192.168.10.10:8123/db";
        //String url = "jdbc:ch://192.168.10.10:8123/db";

        // TCP
        String url = "jdbc:ch:tcp://192.168.10.10:9000/db";

        usedPooledConnection(url);
    }
}

Exception thrown: java.sql.SQLException: No client available, server ...

Exception in thread "main" com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: No client available, server ClickHouseNode [uri=tcp://192.168.10.10:9000/db]@-34669869
    at com.zaxxer.hikari.pool.HikariPool.throwPoolInitializationException(HikariPool.java:596)
    at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:582)
    at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115)
    at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
    at com.mycompany.app.App.usedPooledConnection(App.java:26)
    at com.mycompany.app.App.main(App.java:40)
Caused by: java.sql.SQLException: No client available, server ClickHouseNode [uri=tcp://192.168.10.10:9000/db]@-34669869
    at com.clickhouse.jdbc.SqlExceptionUtils.handle(SqlExceptionUtils.java:85)
    at com.clickhouse.jdbc.SqlExceptionUtils.create(SqlExceptionUtils.java:31)
    at com.clickhouse.jdbc.SqlExceptionUtils.handle(SqlExceptionUtils.java:90)
    at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.getServerInfo(ClickHouseConnectionImpl.java:131)
    at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.<init>(ClickHouseConnectionImpl.java:335)
    at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.<init>(ClickHouseConnectionImpl.java:288)
    at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:68)
    at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:16)
    at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
    at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
    at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
    at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561)
    ... 4 more
Caused by: java.net.ConnectException: No client available
    at com.clickhouse.client.ClickHouseClientBuilder$DummyClient.execute(ClickHouseClientBuilder.java:56)
    at com.clickhouse.client.ClickHouseClientBuilder$Agent.sendOnce(ClickHouseClientBuilder.java:282)
    at com.clickhouse.client.ClickHouseClientBuilder$Agent.send(ClickHouseClientBuilder.java:294)
    at com.clickhouse.client.ClickHouseClientBuilder$Agent.execute(ClickHouseClientBuilder.java:349)
    at com.clickhouse.client.ClickHouseClient.executeAndWait(ClickHouseClient.java:878)
    at com.clickhouse.client.ClickHouseRequest.executeAndWait(ClickHouseRequest.java:2154)
    at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.getServerInfo(ClickHouseConnectionImpl.java:128)
    ... 12 more

image

pom.xml

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.mycompany.app</groupId>
    <artifactId>my-app</artifactId>
    <version>1.0-SNAPSHOT</version>

    <name>my-app</name>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>

        <clickhouse-java.version>0.6.0-patch4</clickhouse-java.version>
        <hikaricp.version>4.0.3</hikaricp.version>
        <apache-httpclient.version>5.2.3</apache-httpclient.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>com.clickhouse</groupId>
            <artifactId>clickhouse-jdbc</artifactId>
            <version>${clickhouse-java.version}</version>
            <classifier>http</classifier>
        </dependency>

        <dependency>
            <groupId>com.zaxxer</groupId>
            <artifactId>HikariCP</artifactId>
            <version>${hikaricp.version}</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.clickhouse/clickhouse-tcp-client -->
        <!--
            // Commented out
            <dependency>
                <groupId>com.clickhouse</groupId>
                <artifactId>clickhouse-tcp-client</artifactId>
                <version>0.3.2-patch9</version>
            </dependency>
        -->

        <!-- Recommended to communicate with ClickHouse server over http -->
        <dependency>
            <groupId>org.apache.httpcomponents.client5</groupId>
            <artifactId>httpclient5</artifactId>
            <version>${apache-httpclient.version}</version>
        </dependency>
    </dependencies>

    <build>
        <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
            <plugins>
                <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>3.1.0</version>
                </plugin>
                <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.8.0</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.22.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.2</version>
                </plugin>
                <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
                <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>3.7.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-project-info-reports-plugin</artifactId>
                    <version>3.0.0</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>
</project>
leiless commented 1 month ago

I've also tried:

-<classifier>http</classifier>
+<classifier>all</classifier>

Exception thrown: ClickHouse binary and docker command not found. Please modify option clickhouse_cli_path or docker_cli_path.

Exception in thread "main" java.lang.ExceptionInInitializerError: ClickHouse binary and docker command not found. Please modify option clickhouse_cli_path or docker_cli_path.
    at com.clickhouse.client.ClickHouseClientBuilder.build(ClickHouseClientBuilder.java:449)
    at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.<init>(ClickHouseConnectionImpl.java:309)
    at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.<init>(ClickHouseConnectionImpl.java:288)
    at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:68)
    at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:16)
    at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
    at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
    at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
    at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561)
    at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115)
    at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
    at com.mycompany.app.App.usedPooledConnection(App.java:26)
    at com.mycompany.app.App.main(App.java:40)

Still, no TCP client.

image

chernser commented 1 month ago

@leiless thank you for the wonderful bug report! Unfortunately current java client doesn't support TCP connections. However we are planning to implement it soon.

What is your use-case? Is JDBC a requirement?

Thanks!

JamesChenX commented 1 month ago

@chernser Hi, we are also waiting for the native TCP connection, and plan to use ClickHouse to store and analyze user activity logs for our project once TCP is supported. The main reason we chose ClickHouse is its high performance, while HTTP (especially HTTP/1.1) is inefficient and unnecessary for server-database communication, and we don't want to provide a "fallback" solution (HTTP) for our users, so we plan to wait until TCP is supported.

And we don't use JDBC because JDBC is redundant and meaningless for our use cases.

And here is my two cents, a lot of modern libs like spring-framework and mongo-java-driver implement async operations based on reactor-netty, which is a high-quality asynchronous network library, and will make the ClickHouse TCP protocol easier to implement. You could give it a try when implementing the TCP connection.

leiless commented 1 month ago

Hi, @chernser. I have a similar use case as with @JamesChenX, so fundamentally the high-performance is the key feature we're looking for.

HTTP protocol will incur some other overheads which is not optimal for a high-performance-oriented system.

chernser commented 1 month ago

@leiless @JamesChenX Thank you for your response!

I understand your needs and agree.

As my personal opinion, I would say that HTTP can be a performant so far. ClickHouse HTTP API is quite lightweight and we support many different formats over HTTP. Our Kafka connector (that is used by our customers for sending heavy load of data to the DB) is RowBinary format over http and it is quite fast. Here is what important:

JamesChenX commented 1 month ago

@chernser Thank you for your quick response! We have another reason for TCP protocol besides the performance reason: Currently, clickhouse-java will send query requests to the ClickHouse server in blocking I/O served by a thread pool. For example: https://github.com/ClickHouse/clickhouse-java/blob/6e2fa0de9b786d1ee749e14ca4e554afaeccba4e/clickhouse-client/src/main/java/com/clickhouse/client/ClickHouseClient.java#L644 , which is okay, but can be better if the client sends requests in a reactive approach using non-blocking I/O.

So we plan to write a simple reactive TCP client for our use cases once the clickhouse-java team implements the ClickHouse TCP protocol to achieve two goals at the same time: 1. Non-blocking I/O only; 2. Efficient TCP protocol. If the TCP client is implemented in a reactive way, perfect! We can use it without too much effort. If not, we will build a simple reactive client for our use cases based on your ClickHouse TCP protocol implementation.


We haven't built a reactive HTTP client because we don't want to make another effort to support HTTP, and drop it to switch to build another reactive TCP client once it is supported by clickhouse-java. So it would be pleasant if clickhouse-java team plans to support TCP protocol in the near future.

MacNale commented 1 month ago

I am looking for High performance java client which can insert the records to Clickhouse. As Clickhouse likes batch insert, I am willing to batch at the client side. I have at-least two choices available - 1) using JDBC library (with input function) 2) Use ClickHouseClient library.

I could batch using JDBC (ps.executeBatch()), BUT I dont know how to do the batching using ClickHouseClient.

// writing happens in main thread
.....
  for (int i = 0; i < NUMBER_OF_RECORDS; i++) {
      BinaryStreamUtils.writeString(stream, String.valueOf(i % 16)); // MetricId
      BinaryStreamUtils.writeString(stream, "MetricName_" + i); // MetricName
      ....
   }

   // response should be always closed
try (ClickHouseResponse response = future.get()) {
  ClickHouseResponseSummary summary = response.getSummary();
  }

I dont necessarily need JDBC, I need high performance client.

chernser commented 1 month ago

@MacNale I think, just a client (without JDBC) would work better for you because of JDBC add some overhead. As for batches native format seems the best option (this format may be implemented over http). Another alternative is RowBinary - it is performant but do not enforce batches.

chernser commented 1 month ago

@JamesChenX I see your point. That would take a time. I will keep you updated. Have you checked r2dbc ? It is using http, but when we add native protocol support I think the interface should stay almost the same.