Beyond 2.0, Clickhouse-Data-Bridge

ramazanpolat commented 3 years ago

I've been using clickhouse-jdbc-bridge v1.0 for a long time. It helps me tremendously. A couple of days ago, I wanted to build a new Clickhouse cluster that uses the new version of the clickhouse-jdbc-bridge but the 2.0 looks like too much feature squeezed into one product that makes it hard to grasp, especially for newcomers. In terms of usage and features, it may seem pretty obvious for old users of the clickhouse-jdbc-bridge but believe me, it is more complicated than it sounds. Remember that clickhouse-jdbc-bridge started to act as a proxy between Clickhouse and other JDBC databases. But after 2.0 release, it is no more limited to JDBC. It evolved in a way that makes it a general-purpose data fetching solution for Clickhouse. Therefore this naming it as "jdbc-bridge" seems inaccurate and incomplete.

To make it more clear for users and also make it easily extendible, understandable, and usable, here I propose some major changes to this product.

I propose to release a new version (v3.0) with a new name and clear description like this:

The new version is ClickHouse Data Bridge and it is v3.0.
It contains a number of bridges that can be used to transfer data between Clickhouse and other products.
JDBC bridge is just one of them. It is used to transfer data between Clickhouse and other JDBC compliant databases. It is currently just a simplex data bridge, which means the data transfer is uni-directional. Therefore it can only transfer data from JDBC compliant databases to Clickhouse. But it has the potential to be a duplex bridge, which will allow users to also transfer data from Clickhouse to JDBC compliant databases. In fact, the JDBC bridge supports INSERT to write data TO JDBC databases but it is work-in-progress.
Another well-known bridge is the Native Bridge, which connects Clickhouse to another Clickhouse. Clickhouse-server has a built-in duplex Native bridge, which is implemented as a table function (remote and remoteSecure)
We also have JavaScript bridge, which can be used to run JS code and get the result.
URL bridge is used to transfer data between Clickhouse and an HTTP server (this is also implemented in Clickhouse-server code as URL table engine).
REST bridge is a specialized kinf of URL bridge, which follows common REST standards to transfer data between Clickhouse and REST services.
Plugin architecture helps user to write their own bridge with Java and make it a .jar file to plug into ClickHouse Data Bridge.

I believe naming is much more important than we think. Because naming things properly makes it easy to distinguish things and understand the difference. What do you think @alex-krash , @alexey-milovidov ?

zhicwu commented 3 years ago

Thanks for sharing your thoughts on this @ramazanpolat. I'll take the blame of making it too complex :p

A few comments:

clickhouse-jdbc-bridge 2.0 is bi-directional. I gave a few examples for mutation in README. However, it's inconvenient as you have to create a table using JDBC engine first. I added a SQL parser in clickhouse-jdbc, so that we can run queries like below:

-- #jdbc is a client-side macro, which is not available in public release yet
insert into #jdbc('db1', 'schema1', 'table1')
select * from jdbc('db1', 'select * from table1 where col1=1 limit 100')

-- above query will be translated into below:
drop table if exists jdbc_db1_schema1_table1;
create table jdbc_db1_schema1_table1(...) engine=JDBC('db1', 'schema1', 'table1');
insert into jdbc_db1_schema1_table1
select * from jdbc('db1', 'select * from table1 where col1=1 limit 100');
drop table if exists jdbc_db1_schema1_table1;

Scripting 2.0 is based on JSR-223, so Groovy and Jython etc. are supported as well.
Naming In the beginning, I named 2.0 clickhouse-datasource-bridge but later I changed it back, because it's still using XDBC bridge protocol.
Issues IMO, there 2 critical issues in ClickHouse need to be addressed: 1) stability; and 2) optimize the protocol to avoid unnecessary overhead.
Future I think XDBC bridge can be renamed to ODBC bridge, and it's better to create a new and more generic table function and engine like Data Bridge/Connector, with more features like pushdown hints. clickhouse-jdbc-bridge can then be renamed accordingly.

ramazanpolat commented 3 years ago

@zhicwu I can't thank you enough for what you did here. I believe that you put more features and made it more flexible than ever. Kudos to you for all of your efforts.

I'm not fluent in 2.0 features. I just wanted to point out that it will become much more cluttered in time if we don't name the features properly. I'm not suggesting any feature or code upgrade here. My suggestions are just for naming things properly and making it more structural.

E.g: Now we have a plethora of options to fetch data from other data sources, it is much more flexible than ever, but this also makes it hard to grasp. So l would suggest that let's call each connection type a "bridge". This will make it easy for readers to distinguish. Just like in my first post, if we are using JDBC, then let's call it JDBC Bridge. This approach will draw a picture in our head that simplifies the structure and usage of this tool, like "Ok, this thing has a number of bridges, that connects to other datasources(and databases)". Since you said that JDBC bridge is bi-directional, let's stick to this naming. So we can have a documentation that says "bridges can be either uni-directional or bi-directional".

BTW, I would like to write a tutorial for this repo if you may. Just chasing the right time for it.

zhicwu commented 3 years ago

Thanks @ramazanpolat. I'm glad this can be of use. Kudos goes to ClickHouse team and @alex-krash who created JDBC bridge.

Looking forward to see your tutorial :) In the near future, I'll also spend some time to improve the poor documentation, once I'm done with JDBC driver refactoring.

ClickHouse / clickhouse-jdbc-bridge

Beyond 2.0, Clickhouse-Data-Bridge #97