cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.8k stars 1.76k forks source link

DB::Exception: Session is locked by a concurrent client. (SESSION_IS_LOCKED) #7906

Open ranjeetranjan opened 6 months ago

ranjeetranjan commented 6 months ago

Getting error

{
    "error": "Error: Code: 373. DB::Exception: Session is locked by a concurrent client. (SESSION_IS_LOCKED) (version 22.3.20.29 (official build))\n"
}

To Reproduce As per the CLickHouse expert

ClickHouse shows an error message if there are two or more parallel queries with the same session_id. Usually the session_id must be different each time

In details The error occurs within ClickHouse's internals when a session is trying to be acquired, but it's already locked by another client. This might happen due to concurrent queries or operations trying to access the same session simultaneously, causing contention.

Version: 0.34.61

igorlukanin commented 6 months ago

Hi @ranjeetranjan 👋 Could you please provide more context here?

ranjeetranjan commented 6 months ago

Hi @igorlukanin I am getting below error in the cubejs

{"message":"Error querying db","error":"Error: socket hang up\n at QueryQueue.parseResult (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:384:13)\n at QueryQueue.executeInQueue (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:356:19)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at PreAggregationLoadCache.keyQueryResult (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:435:75)\n at async Promise.all (index 0)\n at PreAggregationLoader.loadPreAggregationWithKeys (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:645:30)\n at PreAggregationLoader.loadPreAggregation (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:575:22)\n at preAggregationPromise (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:2214:30)\n at QueryOrchestrator.fetchQuery (/usr/src/app/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryOrchestrator.ts:241:9)\n at OrchestratorApi.executeQuery (/usr/src/app/node_modules/@cubejs-backend/server-core/src/core/OrchestratorApi.ts:98:20)\n at /usr/src/app/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:605:13\n at async Promise.all (index 1)\n at RefreshScheduler.refreshPreAggregations (/usr/src/app/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:590:5)\n at async Promise.all (index 1)\n at RefreshScheduler.runScheduledRefresh (/usr/src/app/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:285:9)","requestId":"scheduler-383984a0-dae6-4222-83dd-b0527fbc6b3f"}

To dig into I open a discussion on the ClickHouse Slack community and they recommend

ClickHouse shows that error message if there are two or more parallel queries with the same session_id. Usually, the session_id must be different each time.

As per the Clickhouse community suggestion session_id should be unique. The same session ID creating issues.

paveltiunov commented 6 months ago

@ranjeetranjan Each connection in the connection pool allocated by Cube has a unique session_id. So, there is either a bug somewhere in connection pooling logic or there is some edge case with your workload that leads to the problem. Either way, we need a test to reproduce it as a first step. Also, a fix contribution would be helpful here as well.

github-actions[bot] commented 6 months ago

If you are interested in working on this issue, please provide go ahead and provide PR for that. We'd be happy to review it and merge it. If this is the first time you are contributing a Pull Request to Cube, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube Slack.

ranjeetranjan commented 6 months ago

@paveltiunov We utilize a Dashboard containing over 10 to 15 cards for data visualization. We run parallel requests for all these cards to efficiently retrieve and display the data on the Dashboard. It's observed that when parallel requests are made, they often share the same session ID, which could lead to conflicts or inconsistencies. Also we used to retry if we get the continued wait.

matthieuCarlini commented 5 days ago

Hi, it seems I have the same problem. I've a cube core (v0.36.2) and clickhouse (24.7.3.42) both on premise.

I have 2 cubes on 2 tables clickhouse. When I made a view joining the 2 cubes, I receive a error in my playground : Code: 373. DB::Exception: Session 6f5ae44f-f919-4394-80b3-92d043521739 is locked by a concurrent client. (SESSION_IS_LOCKED) (version 24.7.3.42 (official build))

The complete log on clickhouse side is :

2024.09.27 12:15:16.082504 [ 556216 ] {} <Error> DynamicQueryHandler: Code: 373. DB::Exception: Session 0f361321-fd30-4382-a030-f3be3a508469 is locked by a concurrent client. (SESSION_IS_LOCKED), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000d5f7a3b
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000079f8ccc
2. DB::Exception::Exception<String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type>, String const&) @ 0x0000000007a0b40b
3. DB::Session::makeSessionContext(String const&, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>, bool) @ 0x00000000115ed2c9
4. DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::optional<DB::CurrentThread::QueryScope>&, StrongTypedef<unsigned long, ProfileEvents::EventTag> const&) @ 0x0000000012b56741
5. DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&, StrongTypedef<unsigned long, ProfileEvents::EventTag> const&) @ 0x0000000012b5de39
6. DB::HTTPServerConnection::run() @ 0x0000000012bdf993
7. Poco::Net::TCPServerConnection::start() @ 0x00000000159a2267
8. Poco::Net::TCPServerDispatcher::run() @ 0x00000000159a26f9
9. Poco::PooledThread::run() @ 0x000000001596f841
10. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001596ddfd
11. start_thread @ 0x0000000000007ea7
12. ? @ 0x00000000000fba6f
 (version 24.7.3.42 (official build))

@paveltiunov : I don't know how I can help in the resolution of this issue. We're all on premise so I can test further if it can help.

igorlukanin commented 2 days ago

@matthieuCarlini Thanks for adding some info here.

I've tried to reproduce this in Cube Cloud on v0.36.3 with ClickHouse v24.10.1.243 in the following way but it worked without issues for me.

Data model:

cubes:
  - name: countries
    sql_table: countries

    joins:
      - name: country_codes
        relationship: one_to_one
        sql: "{countries.name} = {country_codes.name}"

    dimensions:
      - name: name
        sql: name
        type: string

  - name: country_codes
    sql_table: country_codes

    dimensions:
      - name: name
        sql: name
        type: string

      - name: code
        sql: code
        type: string

views:
  - name: country_info

    cubes:
      - join_path: countries
        includes:
          - name: name
            alias: territory_name

      - join_path: countries.country_codes
        includes: '*'

Query:

{
  "limit": 5000,
  "dimensions": [
    "country_info.territory_name",
    "country_info.name",
    "country_info.code"
  ]
}

ClickHouse credentials:

Screenshot 2024-09-30 at 14 23 48

Results:

Screenshot 2024-09-30 at 14 23 04

@matthieuCarlini @ranjeetranjan It would be great if you can help reproduce this. For starters, if you connect to this public ClickHouse server instead of your own one, are you able to reproduce this issue?

ranjeetranjan commented 2 days ago

@matthieuCarlini @igorlukanin Earlier I was using node:latest as build-step for the docker images but when I changed to cubejs/cube:v0.35.55 as docker images the same problem did not come.

matthieuCarlini commented 2 days ago

@igorlukanin : thanks for the test model online. Your example work for us to. So we investigate a little bit. It seems that the 2 joined cubes hit a sql in clickhouse that takes to much time for the max_execution_time of our clickhouse server. We look for a better approach, and it don't seems to be a cube problem. Thanks for your time, my apologies for this mistake analysis.

igorlukanin commented 2 days ago

@matthieuCarlini Thanks for the update! Indeed, joins in ClickHouse are an interesting thing 😄

@ranjeetranjan Do you think this was also an issue in your case? It would be great if you can test this on the latest version of Cube, because if this does not reproduce, then we'll call it a day and close this issue.

igorlukanin commented 2 days ago

@matthieuCarlini I wonder if this also deserves a ticket in ClickHouse repo, since the error message is so wildly different from the apparent root cause.

ranjeetranjan commented 2 days ago

@igorlukanin This came only when we had joins with lot of data but when we use cubejs/cube:v0.35.55 the issues did not came. After changing the build images https://github.com/cube-js/cube/issues/6477 also did not came.