ClickHouse / clickhouse-jdbc-bridge

A JDBC proxy from ClickHouse to external databases
Apache License 2.0
167 stars 60 forks source link

Does bridge support cocurrent jdbcfunction? #113

Closed lichenglin closed 3 years ago

lichenglin commented 3 years ago

I hava exucte a very long sql such as "insert into table select * from jdbc(.....)" where jdbc(....) has about 1 blilion rows. It takes a very long time to finish the sql.

But I found that when the sql excuting , the other sql that use jdbc fuction and jdbcTable will block until the sql above has been done.

Every jdbcfunciton must be excueted one by one? sync??

I have change the default timeout to a very long time. inlude the config in server.json 、vertx.json、datasource.json.

The work.num of bridge use the default value

If I missed something??

Thank U for your Reply.

zhicwu commented 3 years ago

I'd suggest you to consider partition(query by date range etc.) when there's lots of data. This will break one long-running query into smaller pieces each will take less time, so that you don't have to increase timeout to an unreasonable number like an hour.

What's the observation from show processlist? I think it's controlled by clickhouse, but the connection pool size configured in JDBC bridge may affect concurrency as well. I believe you're using the latest JDBC bridge, but what's the version of ClickHouse you're using?

lichenglin commented 3 years ago

I'd suggest you to consider partition(query by date range etc.) when there's lots of data. This will break one long-running query into smaller pieces each will take less time, so that you don't have to increase timeout to an unreasonable number like an hour.

What's the observation from show processlist? I think it's controlled by clickhouse, but the connection pool size configured in JDBC bridge may affect concurrency as well. I believe you're using the latest JDBC bridge, but what's the version of ClickHouse you're using?

In our scenario, there are some dimension tables that change all the time, and we hope to query the latest data in the dimension tables through JDBC. But the problem is that once jdbcfunction is serial in the global scope, all statements that represent jdbcfunction are serial.Once one sql delay,the other will delay too.

Should I make new issue to clickhouse?not bridge?

My bridge version is 2.0.3, clickhouse version is 21.10.1.7859

zhicwu commented 3 years ago

I wasn't able to produce your issue. I'd suggest you to double check your environment before creating an issue:

  1. run show processlist on clickhouse to confirm if jdbc queries are running in parallel or not
  2. check connection pool size defined in JDBC bridge
  3. check if there's any restriction on source database limiting connection from JDBC bridge
  4. try if you can reproduce the issue on a LTS/stable version like 21.8

Taking my test environment as an example:

lichenglin commented 3 years ago

I wasn't able to produce your issue. I'd suggest you to double check your environment before creating an issue:

  1. run show processlist on clickhouse to confirm if jdbc queries are running in parallel or not
  2. check connection pool size defined in JDBC bridge
  3. check if there's any restriction on source database limiting connection from JDBC bridge
  4. try if you can reproduce the issue on a LTS/stable version like 21.8

Taking my test environment as an example:

  • on jdbc bridge I have below self datasource defined as below
    {
    "self": {
      "aliases": ["_", "current"],
      "driverClassName": "ru.yandex.clickhouse.ClickHouseDriver",
      "jdbcUrl": "jdbc:clickhouse://my-ch-server1:8123/system?ssl=false",
      "username": "readonly",
      "password": "***",
      "maximumPoolSize": 5
    }
    }
  • repeat below queries a few times, and then run show processlist on clickhouse, you'll see up to 5 processes(because max connection pool size is 5)
    select * from jdbc('self', 'select * from numbers(10000000)')

There would be 5 process running,but the sql will be excuted one by one.

if one sql coast 10sencods the last one will coast 50s.

zhicwu commented 3 years ago

There would be 5 process running,but the sql will be excuted one by one. if one sql coast 10sencods the last one will coast 50s.

OK, I'm not aware of this. Then I guess you can raise an issue in clickhouse repo for help.

zhicwu commented 3 years ago
SELECT
    query_id,
    query_start_time,
    query,
    query_duration_ms
FROM system.query_log
WHERE (query LIKE '%numbers(10000000)%') AND (type = 'QueryFinish')
ORDER BY event_time DESC
LIMIT 10
┌─query_id─────────────────────────────┬────query_start_time─┬─query───────────────────────────────────────────────────────────────┬─query_duration_ms─┐
│ 3ada22f9-16b2-40b2-9319-73cf29142c12 │ 2021-08-25 08:22:41 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             37077 │
│ ca41ab6a-440c-486a-bba8-73de0923199a │ 2021-08-25 08:23:15 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3007 │
│ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             34160 │
│ 8201ba9d-ba0c-455d-bb82-418a84c17290 │ 2021-08-25 08:23:11 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              2970 │
│ ed6d7274-16f4-41f1-9a53-399f2f778fe1 │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             31256 │
│ 05e73f8b-b170-4c53-a4b4-461c04ae03cc │ 2021-08-25 08:23:07 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3605 │
│ b4bc8db3-72c5-4650-a0bf-40b8c64432f9 │ 2021-08-25 08:22:39 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             27712 │
│ 194164fc-5517-4244-a28f-59ab4a52e79e │ 2021-08-25 08:23:02 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3446 │
│ 4424095b-bd45-42b0-b1bf-9be08c82a002 │ 2021-08-25 08:22:38 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             24501 │
│ 17a4f296-3c1b-46d5-965b-6acf34e41584 │ 2021-08-25 08:22:57 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              4416 │
└──────────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────┴───────────────────┘
lichenglin commented 3 years ago
SELECT
    query_id,
    query_start_time,
    query,
    query_duration_ms
FROM system.query_log
WHERE (query LIKE '%numbers(10000000)%') AND (type = 'QueryFinish')
ORDER BY event_time DESC
LIMIT 10
┌─query_id─────────────────────────────┬────query_start_time─┬─query───────────────────────────────────────────────────────────────┬─query_duration_ms─┐
│ 3ada22f9-16b2-40b2-9319-73cf29142c12 │ 2021-08-25 08:22:41 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             37077 │
│ ca41ab6a-440c-486a-bba8-73de0923199a │ 2021-08-25 08:23:15 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3007 │
│ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             34160 │
│ 8201ba9d-ba0c-455d-bb82-418a84c17290 │ 2021-08-25 08:23:11 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              2970 │
│ ed6d7274-16f4-41f1-9a53-399f2f778fe1 │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             31256 │
│ 05e73f8b-b170-4c53-a4b4-461c04ae03cc │ 2021-08-25 08:23:07 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3605 │
│ b4bc8db3-72c5-4650-a0bf-40b8c64432f9 │ 2021-08-25 08:22:39 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             27712 │
│ 194164fc-5517-4244-a28f-59ab4a52e79e │ 2021-08-25 08:23:02 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3446 │
│ 4424095b-bd45-42b0-b1bf-9be08c82a002 │ 2021-08-25 08:22:38 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             24501 │
│ 17a4f296-3c1b-46d5-965b-6acf34e41584 │ 2021-08-25 08:22:57 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              4416 │
└──────────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────┴───────────────────┘

image

I'm really consufed....

zhicwu commented 3 years ago

@kitaisreal, is it true that XDBC executes query one after another in serial mode?

@lichenglin, as to your specific case(query large dimension from external database for keyword analysis I guess), have you considered zipper table along with incremental way to sync changes into clickhouse?

lichenglin commented 3 years ago

I have solved this problem. I currently use sqlserver and mssql Driver. It seems that there are some problems with the combination of this driver and our datasource pool. I changed the driver to jtds, and at the same time I delete the code about ClientInfo in JdbcDataSource.(because the jtds not support clientInfo) And It works.....

// try { // conn.setClientInfo(PROP_CLIENT_NAME, DEFAULT_CLIENT_NAME); // } catch (Exception e) { // log.warn("Failed call setClientInfo due to {}", e.getMessage()); // }

lichenglin commented 3 years ago

I suggest

    try {
            conn.setClientInfo(PROP_CLIENT_NAME, DEFAULT_CLIENT_NAME);
        } catch (Exception e) {
            log.warn("Failed call setClientInfo due to {}", e.getMessage());
        }

change to

    try {
           conn.setClientInfo(PROP_CLIENT_NAME, DEFAULT_CLIENT_NAME);
        } catch (Throwable e) {
            log.warn("Failed call setClientInfo due to {}", e.getMessage());
        }

Since jtds throw an Abstractmethoderror when setClientInfo is called. Abstractmethoderror is an Error not an Exception

lichenglin commented 3 years ago
SELECT
    query_id,
    query_start_time,
    query,
    query_duration_ms
FROM system.query_log
WHERE (query LIKE '%numbers(10000000)%') AND (type = 'QueryFinish')
ORDER BY event_time DESC
LIMIT 10
┌─query_id─────────────────────────────┬────query_start_time─┬─query───────────────────────────────────────────────────────────────┬─query_duration_ms─┐
│ 3ada22f9-16b2-40b2-9319-73cf29142c12 │ 2021-08-25 08:22:41 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             37077 │
│ ca41ab6a-440c-486a-bba8-73de0923199a │ 2021-08-25 08:23:15 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3007 │
│ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             34160 │
│ 8201ba9d-ba0c-455d-bb82-418a84c17290 │ 2021-08-25 08:23:11 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              2970 │
│ ed6d7274-16f4-41f1-9a53-399f2f778fe1 │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             31256 │
│ 05e73f8b-b170-4c53-a4b4-461c04ae03cc │ 2021-08-25 08:23:07 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3605 │
│ b4bc8db3-72c5-4650-a0bf-40b8c64432f9 │ 2021-08-25 08:22:39 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             27712 │
│ 194164fc-5517-4244-a28f-59ab4a52e79e │ 2021-08-25 08:23:02 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              3446 │
│ 4424095b-bd45-42b0-b1bf-9be08c82a002 │ 2021-08-25 08:22:38 │ select * from jdbc('self', 'select * from numbers(10000000)')       │             24501 │
│ 17a4f296-3c1b-46d5-965b-6acf34e41584 │ 2021-08-25 08:22:57 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │              4416 │
└──────────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────┴───────────────────┘

sorry,the issue still exists.

I doubt your sql may has a cache.the last 4 sql will return very fast

lichenglin commented 3 years ago

│ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select from jdbc('self', 'select from numbers(10000000)') │ 34160 │ Hello @zhicwu

Cloud you run another test on your machine? first client run select from jdbc('self', 'select from numbers(100000000000000000)') and another client run select from jdbc('self', 'select from numbers(100000)')

Thanks a lot

zhicwu commented 3 years ago

Need inputs from @kitaisreal. Below is two queries I issued on 21.7, using a different query from CloudBeaver and clickhouse-client respectively.

┌─query_id─────────────────────────────┬─type────────┬──────────event_time─┬────query_start_time─┬─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─query_duration_ms─┐
│ f7a3bea8-a0c3-453a-9642-0b908270ed9e │ QueryFinish │ 2021-08-27 00:21:00 │ 2021-08-27 00:20:11 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)')                                     │             48412 │
│ d9884025-62e8-426f-9e1b-89ddd25f5cb1 │ QueryFinish │ 2021-08-27 00:20:35 │ 2021-08-27 00:20:09 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)')FORMAT TabSeparatedWithNamesAndTypes │             25042 │
│ f7a3bea8-a0c3-453a-9642-0b908270ed9e │ QueryStart  │ 2021-08-27 00:20:11 │ 2021-08-27 00:20:11 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)')                                     │                 0 │
│ d9884025-62e8-426f-9e1b-89ddd25f5cb1 │ QueryStart  │ 2021-08-27 00:20:09 │ 2021-08-27 00:20:09 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)')FORMAT TabSeparatedWithNamesAndTypes │                 0 │
└──────────────────────────────────────┴─────────────┴─────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┘
zhicwu commented 3 years ago

This has nothing to do with clickhouse. It's a stupid mistake introduced in previous release.

@lichenglin, please upgrade to 2.0.4. I also took your advice by catching throwable when setting client info so it should work well with jTDS driver as well.

lichenglin commented 3 years ago

Thank you very much. It was a great help