Closed lichenglin closed 3 years ago
I'd suggest you to consider partition(query by date range etc.) when there's lots of data. This will break one long-running query into smaller pieces each will take less time, so that you don't have to increase timeout to an unreasonable number like an hour.
What's the observation from show processlist
? I think it's controlled by clickhouse, but the connection pool size configured in JDBC bridge may affect concurrency as well. I believe you're using the latest JDBC bridge, but what's the version of ClickHouse you're using?
I'd suggest you to consider partition(query by date range etc.) when there's lots of data. This will break one long-running query into smaller pieces each will take less time, so that you don't have to increase timeout to an unreasonable number like an hour.
What's the observation from
show processlist
? I think it's controlled by clickhouse, but the connection pool size configured in JDBC bridge may affect concurrency as well. I believe you're using the latest JDBC bridge, but what's the version of ClickHouse you're using?
In our scenario, there are some dimension tables that change all the time, and we hope to query the latest data in the dimension tables through JDBC. But the problem is that once jdbcfunction is serial in the global scope, all statements that represent jdbcfunction are serial.Once one sql delay,the other will delay too.
Should I make new issue to clickhouse?not bridge?
My bridge version is 2.0.3, clickhouse version is 21.10.1.7859
I wasn't able to produce your issue. I'd suggest you to double check your environment before creating an issue:
show processlist
on clickhouse to confirm if jdbc queries are running in parallel or notTaking my test environment as an example:
self
datasource defined as below
{
"self": {
"aliases": ["_", "current"],
"driverClassName": "ru.yandex.clickhouse.ClickHouseDriver",
"jdbcUrl": "jdbc:clickhouse://my-ch-server1:8123/system?ssl=false",
"username": "readonly",
"password": "***",
"maximumPoolSize": 5
}
}
show processlist
on clickhouse, you'll see up to 5 processes(because max connection pool size is 5)
select * from jdbc('self', 'select * from numbers(10000000)')
I wasn't able to produce your issue. I'd suggest you to double check your environment before creating an issue:
- run
show processlist
on clickhouse to confirm if jdbc queries are running in parallel or not- check connection pool size defined in JDBC bridge
- check if there's any restriction on source database limiting connection from JDBC bridge
- try if you can reproduce the issue on a LTS/stable version like 21.8
Taking my test environment as an example:
- on jdbc bridge I have below
self
datasource defined as below{ "self": { "aliases": ["_", "current"], "driverClassName": "ru.yandex.clickhouse.ClickHouseDriver", "jdbcUrl": "jdbc:clickhouse://my-ch-server1:8123/system?ssl=false", "username": "readonly", "password": "***", "maximumPoolSize": 5 } }
- repeat below queries a few times, and then run
show processlist
on clickhouse, you'll see up to 5 processes(because max connection pool size is 5)select * from jdbc('self', 'select * from numbers(10000000)')
There would be 5 process running,but the sql will be excuted one by one.
if one sql coast 10sencods the last one will coast 50s.
There would be 5 process running,but the sql will be excuted one by one. if one sql coast 10sencods the last one will coast 50s.
OK, I'm not aware of this. Then I guess you can raise an issue in clickhouse repo for help.
SELECT
query_id,
query_start_time,
query,
query_duration_ms
FROM system.query_log
WHERE (query LIKE '%numbers(10000000)%') AND (type = 'QueryFinish')
ORDER BY event_time DESC
LIMIT 10
┌─query_id─────────────────────────────┬────query_start_time─┬─query───────────────────────────────────────────────────────────────┬─query_duration_ms─┐
│ 3ada22f9-16b2-40b2-9319-73cf29142c12 │ 2021-08-25 08:22:41 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 37077 │
│ ca41ab6a-440c-486a-bba8-73de0923199a │ 2021-08-25 08:23:15 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3007 │
│ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 34160 │
│ 8201ba9d-ba0c-455d-bb82-418a84c17290 │ 2021-08-25 08:23:11 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 2970 │
│ ed6d7274-16f4-41f1-9a53-399f2f778fe1 │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 31256 │
│ 05e73f8b-b170-4c53-a4b4-461c04ae03cc │ 2021-08-25 08:23:07 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3605 │
│ b4bc8db3-72c5-4650-a0bf-40b8c64432f9 │ 2021-08-25 08:22:39 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 27712 │
│ 194164fc-5517-4244-a28f-59ab4a52e79e │ 2021-08-25 08:23:02 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3446 │
│ 4424095b-bd45-42b0-b1bf-9be08c82a002 │ 2021-08-25 08:22:38 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 24501 │
│ 17a4f296-3c1b-46d5-965b-6acf34e41584 │ 2021-08-25 08:22:57 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 4416 │
└──────────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────┴───────────────────┘
SELECT query_id, query_start_time, query, query_duration_ms FROM system.query_log WHERE (query LIKE '%numbers(10000000)%') AND (type = 'QueryFinish') ORDER BY event_time DESC LIMIT 10
┌─query_id─────────────────────────────┬────query_start_time─┬─query───────────────────────────────────────────────────────────────┬─query_duration_ms─┐ │ 3ada22f9-16b2-40b2-9319-73cf29142c12 │ 2021-08-25 08:22:41 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 37077 │ │ ca41ab6a-440c-486a-bba8-73de0923199a │ 2021-08-25 08:23:15 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3007 │ │ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 34160 │ │ 8201ba9d-ba0c-455d-bb82-418a84c17290 │ 2021-08-25 08:23:11 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 2970 │ │ ed6d7274-16f4-41f1-9a53-399f2f778fe1 │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 31256 │ │ 05e73f8b-b170-4c53-a4b4-461c04ae03cc │ 2021-08-25 08:23:07 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3605 │ │ b4bc8db3-72c5-4650-a0bf-40b8c64432f9 │ 2021-08-25 08:22:39 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 27712 │ │ 194164fc-5517-4244-a28f-59ab4a52e79e │ 2021-08-25 08:23:02 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3446 │ │ 4424095b-bd45-42b0-b1bf-9be08c82a002 │ 2021-08-25 08:22:38 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 24501 │ │ 17a4f296-3c1b-46d5-965b-6acf34e41584 │ 2021-08-25 08:22:57 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 4416 │ └──────────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────┴───────────────────┘
I'm really consufed....
@kitaisreal, is it true that XDBC executes query one after another in serial mode?
@lichenglin, as to your specific case(query large dimension from external database for keyword analysis I guess), have you considered zipper table along with incremental way to sync changes into clickhouse?
I have solved this problem. I currently use sqlserver and mssql Driver. It seems that there are some problems with the combination of this driver and our datasource pool. I changed the driver to jtds, and at the same time I delete the code about ClientInfo in JdbcDataSource.(because the jtds not support clientInfo) And It works.....
// try { // conn.setClientInfo(PROP_CLIENT_NAME, DEFAULT_CLIENT_NAME); // } catch (Exception e) { // log.warn("Failed call setClientInfo due to {}", e.getMessage()); // }
I suggest
try {
conn.setClientInfo(PROP_CLIENT_NAME, DEFAULT_CLIENT_NAME);
} catch (Exception e) {
log.warn("Failed call setClientInfo due to {}", e.getMessage());
}
change to
try {
conn.setClientInfo(PROP_CLIENT_NAME, DEFAULT_CLIENT_NAME);
} catch (Throwable e) {
log.warn("Failed call setClientInfo due to {}", e.getMessage());
}
Since jtds throw an Abstractmethoderror when setClientInfo is called. Abstractmethoderror is an Error not an Exception
SELECT query_id, query_start_time, query, query_duration_ms FROM system.query_log WHERE (query LIKE '%numbers(10000000)%') AND (type = 'QueryFinish') ORDER BY event_time DESC LIMIT 10
┌─query_id─────────────────────────────┬────query_start_time─┬─query───────────────────────────────────────────────────────────────┬─query_duration_ms─┐ │ 3ada22f9-16b2-40b2-9319-73cf29142c12 │ 2021-08-25 08:22:41 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 37077 │ │ ca41ab6a-440c-486a-bba8-73de0923199a │ 2021-08-25 08:23:15 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3007 │ │ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 34160 │ │ 8201ba9d-ba0c-455d-bb82-418a84c17290 │ 2021-08-25 08:23:11 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 2970 │ │ ed6d7274-16f4-41f1-9a53-399f2f778fe1 │ 2021-08-25 08:22:40 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 31256 │ │ 05e73f8b-b170-4c53-a4b4-461c04ae03cc │ 2021-08-25 08:23:07 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3605 │ │ b4bc8db3-72c5-4650-a0bf-40b8c64432f9 │ 2021-08-25 08:22:39 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 27712 │ │ 194164fc-5517-4244-a28f-59ab4a52e79e │ 2021-08-25 08:23:02 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 3446 │ │ 4424095b-bd45-42b0-b1bf-9be08c82a002 │ 2021-08-25 08:22:38 │ select * from jdbc('self', 'select * from numbers(10000000)') │ 24501 │ │ 17a4f296-3c1b-46d5-965b-6acf34e41584 │ 2021-08-25 08:22:57 │ select * from numbers(10000000)FORMAT TabSeparatedWithNamesAndTypes │ 4416 │ └──────────────────────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────┴───────────────────┘
sorry,the issue still exists.
I doubt your sql may has a cache.the last 4 sql will return very fast
│ de7b4565-8f83-42ca-8f36-23b9e992dd0d │ 2021-08-25 08:22:40 │ select from jdbc('self', 'select from numbers(10000000)') │ 34160 │ Hello @zhicwu
Cloud you run another test on your machine? first client run select from jdbc('self', 'select from numbers(100000000000000000)') and another client run select from jdbc('self', 'select from numbers(100000)')
Thanks a lot
Need inputs from @kitaisreal. Below is two queries I issued on 21.7, using a different query from CloudBeaver and clickhouse-client respectively.
┌─query_id─────────────────────────────┬─type────────┬──────────event_time─┬────query_start_time─┬─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─query_duration_ms─┐
│ f7a3bea8-a0c3-453a-9642-0b908270ed9e │ QueryFinish │ 2021-08-27 00:21:00 │ 2021-08-27 00:20:11 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)') │ 48412 │
│ d9884025-62e8-426f-9e1b-89ddd25f5cb1 │ QueryFinish │ 2021-08-27 00:20:35 │ 2021-08-27 00:20:09 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)')FORMAT TabSeparatedWithNamesAndTypes │ 25042 │
│ f7a3bea8-a0c3-453a-9642-0b908270ed9e │ QueryStart │ 2021-08-27 00:20:11 │ 2021-08-27 00:20:11 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)') │ 0 │
│ d9884025-62e8-426f-9e1b-89ddd25f5cb1 │ QueryStart │ 2021-08-27 00:20:09 │ 2021-08-27 00:20:09 │ select * from jdbc('script', 'results String', 'java.lang.Thread.sleep(25000)')FORMAT TabSeparatedWithNamesAndTypes │ 0 │
└──────────────────────────────────────┴─────────────┴─────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┘
This has nothing to do with clickhouse. It's a stupid mistake introduced in previous release.
@lichenglin, please upgrade to 2.0.4
. I also took your advice by catching throwable when setting client info so it should work well with jTDS driver as well.
Thank you very much. It was a great help
I hava exucte a very long sql such as "insert into table select * from jdbc(.....)" where jdbc(....) has about 1 blilion rows. It takes a very long time to finish the sql.
But I found that when the sql excuting , the other sql that use jdbc fuction and jdbcTable will block until the sql above has been done.
Every jdbcfunciton must be excueted one by one? sync??
I have change the default timeout to a very long time. inlude the config in server.json 、vertx.json、datasource.json.
The work.num of bridge use the default value
If I missed something??
Thank U for your Reply.