ClickHouse / clickhouse-jdbc-bridge

A JDBC proxy from ClickHouse to external databases
Apache License 2.0
168 stars 60 forks source link
clickhouse clickhouse-jdbc-bridge external-database

[!NOTE] clickhouse-jdbc-bridge contains experimental codes and is no longer supported. It may contain reliability and security vulnerabilities. Use it at your own risk.

ClickHouse JDBC Bridge

Build Release

JDBC bridge for ClickHouse®. It acts as a stateless proxy passing queries from ClickHouse to external datasources. With this extension, you can run distributed query on ClickHouse across multiple datasources in real time, which in a way simplifies the process of building data pipelines for data warehousing, monitoring and integrity check etc.

Overview

Overview

Known Issues / Limitation

Quick Start

Usage

In most cases, you'll use jdbc table function to query against external datasources:

select * from jdbc('<datasource>', '<schema>', '<query>')

schema is optional but others are mandatory. Please be aware that the query is in native format of the given datasource. For example, if the query is select * from some_table limit 10, it may work in MariaDB but not in PostgreSQL, as the latter one does not understand limit.

Assuming you started a test environment using docker-compose, please refer to examples below to get familiar with JDBC bridge.

Configuration

Migration

Build

You can use Maven to build ClickHouse JDBC bridge, for examples:

git clone https://github.com/ClickHouse/clickhouse-jdbc-bridge.git
cd clickhouse-jdbc-bridge
# compile and run unit tests
mvn -Prelease verify
# release shaded jar, rpm and debian packages
mvn -Prelease package

In order to build docker images:

git clone https://github.com/ClickHouse/clickhouse-jdbc-bridge.git
cd clickhouse-jdbc-bridge
docker build -t my/clickhouse-jdbc-bridge .
# or if you want to build the all-ine-one image
docker build --build-arg revision=20.9.3 -f all-in-one.Dockerfile -t my/clickhouse-all-in-one .

Develop

JDBC bridge is extensible. You may take ConfigDataSource and ScriptDataSource as examples to create your own extension.

An extension for JDBC bridge is basically a Java class with 3 optional parts:

  1. Extension Name

    By default, extension class name will be treated as name for the extension. However, you can declare a static member in your extension class to override that, for instance:

    public static final String EXTENSION_NAME = "myExtension";
  2. Initialize Method

    Initialize method will be called once and only once at the time when loading your extension, for example:

    public static void initialize(ExtensionManager manager) {
        ...
    }
  3. Instantiation Method

    In order to create instance of your extension, in general you should define a static method like below so that JDBC bridge knows how(besides walking through all possible constructors):

    public static MyExtension newInstance(Object... args) {
        ...
    }

Assume your extension class is com.mycompany.MyExtension, you can load it into JDBC bridge by:

Performance

Below is a rough performance comparison to help you understand overhead caused by JDBC bridge as well as its stability. MariaDB, ClickHouse, and JDBC bridge are running on separated KVMs. ApacheBench(ab) is used on another KVM to simulate 20 concurrent users to issue same query 100,000 times after warm-up. Please refer to this in order to setup test environment and run tests by yourself.

Test Case Time Spent(s) Throughput(#/s) Failed Requests Min(ms) Mean(ms) Median(ms) Max(ms)
clickhouse_ping 801.367 124.79 0 1 160 4 1,075
jdbc-bridge_ping 804.017 124.38 0 1 161 10 3,066
clickhouse_url(clickhouse) 801.448 124.77 3 3 160 8 1,077
clickhouse_url(jdbc-bridge) 811.299 123.26 446 3 162 10 3,066
clickhouse_constant-query 797.775 125.35 0 1 159 4 1,077
clickhouse_constant-query(mysql) 1,598.426 62.56 0 7 320 18 2,049
clickhouse_constant-query(remote) 802.212 124.66 0 2 160 8 3,073
clickhouse_constant-query(url) 801.686 124.74 0 3 160 11 1,123
clickhouse_constant-query(jdbc) 925.087 108.10 5,813 14 185 75 4,091
clickhouse(patched)_constant-query(jdbc) 833.892 119.92 1,577 10 167 51 3,109
clickhouse(patched)_constant-query(jdbc-dual) 846.403 118.15 3,021 8 169 50 3,054
clickhouse_10k-rows-query 854.886 116.97 0 12 171 99 1,208
clickhouse_10k-rows-query(mysql) 1,657.425 60.33 0 28 331 123 2,228
clickhouse_10k-rows-query(remote) 854.610 117.01 0 12 171 99 1,201
clickhouse_10k-rows-query(url) 853.292 117.19 5 23 171 105 2,026
clickhouse_10k-rows-query(jdbc) 1,483.565 67.41 11,588 66 297 206 2,051
clickhouse(patched)_10k-rows-query(jdbc) 1,186.422 84.29 6,632 61 237 184 2,021
clickhouse(patched)_10k-rows-query(jdbc-dual) 1,080.676 92.53 4,195 65 216 180 2,013

Note: clickhouse(patched) is a patched version of ClickHouse server by disabling XDBC bridge health check. jdbc-dual on the other hand means dual instances of JDBC bridge managed by docker swarm on same KVM(due to limited resources ;).

Test Case (Decoded) URL
clickhouse_ping http://ch-server:8123/ping
jdbc-bridge_ping http://jdbc-bridge:9019/ping
clickhouse_url(clickhouse) http://ch-server:8123/?query=select * from url('http://ch-server:8123/ping', CSV, 'results String')
clickhouse_url(jdbc-bridge) http://ch-server:8123/?query=select * from url('http://jdbc-bridge:9019/ping', CSV, 'results String')
clickhouse_constant-query http://ch-server:8123/?query=select 1
clickhouse_constant-query(mysql) http://ch-server:8123/?query=select * from mysql('mariadb:3306', 'test', 'constant', 'root', 'root')
clickhouse_constant-query(remote) http://ch-server:8123/?query=select * from remote('ch-server:9000', system.constant, 'default', '')
clickhouse_constant-query(url) http://ch-server:8123/?query=select * from url('http://ch-server:8123/?query=select 1', CSV, 'results String')
clickhouse_constant-query(jdbc) http://ch-server:8123/?query=select * from jdbc('mariadb', 'constant')
clickhouse_10k-rows-query http://ch-server:8123/?query=select 1
clickhouse_10k-rows-query(mysql) http://ch-server:8123/?query=select * from mysql('mariadb:3306', 'test', '10k_rows', 'root', 'root')
clickhouse_10k-rows-query(remote) http://ch-server:8123/?query=select * from remote('ch-server:9000', system.10k_rows, 'default', '')
clickhouse_10k-rows-query(url) http://ch-server:8123/?query=select * from url('http://ch-server:8123/?query=select * from 10k_rows', CSV, 'results String')
clickhouse_10k-rows-query(jdbc) http://ch-server:8123/?query=select * from jdbc('mariadb', 'small-table')