ClickHouse / clickhouse-jdbc-bridge

A JDBC proxy from ClickHouse to external databases
Apache License 2.0
168 stars 60 forks source link

Beyond 2.0, Clickhouse-Data-Bridge #97

Open ramazanpolat opened 3 years ago

ramazanpolat commented 3 years ago

I've been using clickhouse-jdbc-bridge v1.0 for a long time. It helps me tremendously. A couple of days ago, I wanted to build a new Clickhouse cluster that uses the new version of the clickhouse-jdbc-bridge but the 2.0 looks like too much feature squeezed into one product that makes it hard to grasp, especially for newcomers. In terms of usage and features, it may seem pretty obvious for old users of the clickhouse-jdbc-bridge but believe me, it is more complicated than it sounds. Remember that clickhouse-jdbc-bridge started to act as a proxy between Clickhouse and other JDBC databases. But after 2.0 release, it is no more limited to JDBC. It evolved in a way that makes it a general-purpose data fetching solution for Clickhouse. Therefore this naming it as "jdbc-bridge" seems inaccurate and incomplete.

To make it more clear for users and also make it easily extendible, understandable, and usable, here I propose some major changes to this product.

I propose to release a new version (v3.0) with a new name and clear description like this:

I believe naming is much more important than we think. Because naming things properly makes it easy to distinguish things and understand the difference. What do you think @alex-krash , @alexey-milovidov ?

zhicwu commented 3 years ago

Thanks for sharing your thoughts on this @ramazanpolat. I'll take the blame of making it too complex :p

A few comments:

  1. clickhouse-jdbc-bridge 2.0 is bi-directional. I gave a few examples for mutation in README. However, it's inconvenient as you have to create a table using JDBC engine first. I added a SQL parser in clickhouse-jdbc, so that we can run queries like below:

    -- #jdbc is a client-side macro, which is not available in public release yet
    insert into #jdbc('db1', 'schema1', 'table1')
    select * from jdbc('db1', 'select * from table1 where col1=1 limit 100')
    
    -- above query will be translated into below:
    drop table if exists jdbc_db1_schema1_table1;
    create table jdbc_db1_schema1_table1(...) engine=JDBC('db1', 'schema1', 'table1');
    insert into jdbc_db1_schema1_table1
    select * from jdbc('db1', 'select * from table1 where col1=1 limit 100');
    drop table if exists jdbc_db1_schema1_table1;
  2. Scripting 2.0 is based on JSR-223, so Groovy and Jython etc. are supported as well.

  3. Naming In the beginning, I named 2.0 clickhouse-datasource-bridge but later I changed it back, because it's still using XDBC bridge protocol.

  4. Issues IMO, there 2 critical issues in ClickHouse need to be addressed: 1) stability; and 2) optimize the protocol to avoid unnecessary overhead.

  5. Future I think XDBC bridge can be renamed to ODBC bridge, and it's better to create a new and more generic table function and engine like Data Bridge/Connector, with more features like pushdown hints. clickhouse-jdbc-bridge can then be renamed accordingly.

ramazanpolat commented 3 years ago

@zhicwu I can't thank you enough for what you did here. I believe that you put more features and made it more flexible than ever. Kudos to you for all of your efforts.

I'm not fluent in 2.0 features. I just wanted to point out that it will become much more cluttered in time if we don't name the features properly. I'm not suggesting any feature or code upgrade here. My suggestions are just for naming things properly and making it more structural.

E.g: Now we have a plethora of options to fetch data from other data sources, it is much more flexible than ever, but this also makes it hard to grasp. So l would suggest that let's call each connection type a "bridge". This will make it easy for readers to distinguish. Just like in my first post, if we are using JDBC, then let's call it JDBC Bridge. This approach will draw a picture in our head that simplifies the structure and usage of this tool, like "Ok, this thing has a number of bridges, that connects to other datasources(and databases)". Since you said that JDBC bridge is bi-directional, let's stick to this naming. So we can have a documentation that says "bridges can be either uni-directional or bi-directional".

BTW, I would like to write a tutorial for this repo if you may. Just chasing the right time for it.

zhicwu commented 3 years ago

Thanks @ramazanpolat. I'm glad this can be of use. Kudos goes to ClickHouse team and @alex-krash who created JDBC bridge.

Looking forward to see your tutorial :) In the near future, I'll also spend some time to improve the poor documentation, once I'm done with JDBC driver refactoring.