ClickHouse / clickhouse-jdbc-bridge

A JDBC proxy from ClickHouse to external databases
Apache License 2.0
167 stars 60 forks source link

Thread blocked #153

Open zhaoyvli opened 2 years ago

zhaoyvli commented 2 years ago

image

When I execute an SQL query that takes a long time, I get thread blocked, and creating a new query will fail

zhicwu commented 2 years ago

Hi @MrZhao1024B, it's a warning from Vert.x but it should not block your next query. Are you using latest JDBC bridge?

zhaoyvli commented 2 years ago

Hi@zhicwu, I'm using the latest JDBC bridge. Only execute a query that takes a long time to return results, Creating a new query will fail, and I need to wait for the task to finish before executing a new query. My guess is that the task has not received any data from the database, causing the new task to wait.

zhicwu commented 2 years ago

There was an issue(#113) like this but it's been fixed. Just want to double confirm with you: 1) by latest version you meant 2.1.0, right? and 2) if connection pool is used, its maximum size is greater than one.

Will try if I can reproduce the issue on 2.1.0.

zhaoyvli commented 2 years ago

My JDBC bridge version is 2.1.0, clickhouse version is 21.12.3.32.

"maximumPoolSize": 5

zhicwu commented 2 years ago

Thanks for the extra information. What's your query looks like? How many jdbc table functions appear in your query? 3 connections will be used for one query like below:

select * from jdbc('ds1', 'query1')
union all
select * from jdbc('ds1', 'query2')
union all
select * from jdbc('ds1', 'query3')
zhaoyvli commented 2 years ago

My query has only one JDBC table function. select * from jdbc ('ds1', 'query1')

Tests have found that.

  1. First, a query is executed: select * from jdbc('ds1', 'select * from table_01') Second, execute the query: select * from jdbc('ds1', 'select * from table_01') If table_01 has 100 million entries, I can query two SQL processes in the Clickhouse and table_01 databases

  2. First, a query is executed: select cnt from jdbc('ds1', 'select count(1) as cnt from table_01') Second, execute the query: select * from jdbc('ds1', 'select * from table_01') If table_01 has 100 million entries. I can query two SQL processes in Clickhouse, but I can only query the first SQL process in table_01.

zhicwu commented 2 years ago

I was not able to reproduce the issue. What's the target database? It looks like the first query leads to a table lock?

zhaoyvli commented 2 years ago

My target database is Oracle. But my first query using oracle, and my second query using sqlserver will have the same problem. So the reason is not because the table is locked.

zhicwu commented 2 years ago

I see what's going on there. By design, JDBC bridge will perform scheme inferring for each query. This is to help Clickhouse to understand the table schema. Usually it simply issue the original query with fetch size and max rows set to 1. This works well for the first case but not the second(not sure why count query is so slow in your environment). To avoid the extra query, you have to either use named query(query in json config file), or issue a query like select * from jdbc('ds1', 'a String, b Int32, ...', 'your query').

zhaoyvli commented 2 years ago

Thank you for your advice. When I execute query select cnt from jdbc('ds1','a String, b Int32', 'select count(1) as cnt from table_01'), creating a new query is successful.

My count query was so slow. A new query should not be blocked because of this.

lichenglin commented 1 year ago

It seems that the schema query is a serial method?

when there is a long query such as select * from jdbc('source','select count(1) from huge_table')

all the query is blocked.

but when running select * from jdbc('source','c Int64','select count(1) as c from huge_table')

It work well .

lichenglin commented 1 year ago

I think this is maybe the root cause.

In NamedDataSource

I found this columnsCache.get(query, k -> { return inferTypes(schema, query, this.loadSavedQueryAsNeeded(k, params), params); })

When running a long sql. get may block the processs.

It's metioned in caffeine docs , see get javadoc

If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this cache unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. **Some attempted update operations on this cache by other threads may be blocked while the computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this cache.**

peder1001 commented 10 months ago

Any progress on making get stop blocking the process?

lichenglin commented 7 months ago

I modify JdbcBridgeVerticle from

router.post("/columns_info").produces(RESPONSE_CONTENT_TYPE).handler(queryTimeoutHandler) .blockingHandler(this::handleColumnsInfo,SERIAL_MODE);

to

router.post("/columns_info").produces(RESPONSE_CONTENT_TYPE).handler(queryTimeoutHandler) .handler(this::handleColumnsInfo);