Altinity / clickhouse-operator

Altinity Kubernetes Operator for ClickHouse creates, configures and manages ClickHouse® clusters running on Kubernetes
https://altinity.com
Apache License 2.0
1.89k stars 461 forks source link

Performance issues with a query #1468

Closed sanjeev3d closed 2 months ago

sanjeev3d commented 2 months ago

I am experiencing performance issues with a query in ClickHouse that takes a long time to fetch single records.

Query:

select * from gh15minpos limit 1;

Output

┌─GRID_ID──────────────────────────────┬───────────CALL_ID─┬─SEQ_ID─┬─DATE_TIME───────────┬─DATE_TIME_MS─┬─EVENT_ID─┬─EUTRABAND─┬─EARFCN─┬─ENODEB_ID─┬─CELL_ID─┬─CELL_NAME───────┬─RSRP─┬─RSRQ─┬─TA─┬─TILE_ID─────────────┬─DURATION─┬─UL_VOLUME─┬─DL_VOLUME─┬─UL_THROUPUT─┬─DL_THROUPUT─┬──DATA_DATE─┬─DATA_HOUR─┬─CALL_STATUS─┬─CALL_TYPE─┬─MOVING─┬─INDOOR─┬────────────IMEI─┬─CAUSE───────────────┬────────────IMSI─┬─MAKE_ID─┬─MODEL_ID─┬─CQI─┬─SINR_PUSCH─┬─SINR_PUCCH─┬─BEST_EUTRABAND─┬─BEST_EARFCN─┬─BEST_ENODEB_ID─┬─BEST_CELL_ID─┬─BEST_CELL_NAME─┬─BEST_RSRP─┬─BEST_RSRQ─┬─OVERSHOOT─┬─SINR_DL─┬─RANK_INDICATOR─┬─RANK_INDICATOR_SAMPLE─┬───────LON─┬─────LAT─┬─PU_ID─┬─QCI_ID─┬─DL_TRAFFIC_DUR─┬─UL_TRAFFIC_DUR─┬───TILE_LON─┬──TILE_LAT─┬─MOVING_TYPE─┬─UL_MCS─┬─DL_MCS─┬─PHR─┬─UL_PRB─┬─DL_PRB─┬─RESULT_IND─┬─FAILURE_CAUSE_ID─┬─TARGET_ENODEB_ID─┬─TARGET_CELL_ID─┬─TARGET_PCI─┬─TARGET_FREQ─┬─TARGET_TAC─┬─TARGET_RSRP_RSCP_RSSI─┬─TARGET_RSRQ_ECN0─┬─ZOOM_LEVEL─┬─GEOGRAPHYL1_NAME─┬─GEOGRAPHYL1_CODE─┬─GEOGRAPHYL2_NAME─┬─GEOGRAPHYL2_CODE─┬─GEOGRAPHYL3_NAME─┬─GEOGRAPHYL3_CODE─┬─GEOGRAPHYL4_NAME───────┬─GEOGRAPHYL4_CODE─┬─GRID_BOUNDARY─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─ECGI─┐
│ 35.64556215904605_139.91351064859734 │ 17580614388527767 │      9 │ 2024-07-10 13:12:58 │          520 │    30751 │         3 │   1500 │    280216 │       1 │ NNA1201000170_1 │    0 │    0 │  0 │ 5476377397396486757 │        0 │         0 │         0 │           0 │           0 │ 2024-07-10 │        13 │           1 │        21 │      0 │      1 │ 353240431872880 │ 1020644759133226243 │ 440110014439091 │    1467 │   105774 │   8 │          0 │          0 │              0 │           0 │              0 │            0 │ -              │         0 │         0 │         0 │       0 │             26 │                    12 │ 139.91397 │ 35.6456 │ 95003 │      0 │              0 │              0 │ 139.913979 │ 35.645858 │           0 │      0 │      0 │   0 │      0 │     66 │          0 │                0 │                0 │              0 │          0 │           0 │          0 │                     0 │                0 │         18 │ KANTO            │ A                │ CHIBA            │ chb              │ URAYASU-SHI      │ 12227            │ CHB-URAYASUSHI-URAYASU │ CHB-12227-02     │ [[[139.9140672871067,35.64556215904605],[139.9140672871067,35.64601131668811],[139.91351064859734,35.64601131668811],[139.91351064859734,35.64556215904605]]] │    0 │
└──────────────────────────────────────┴───────────────────┴────────┴─────────────────────┴──────────────┴──────────┴───────────┴────────┴───────────┴─────────┴─────────────────┴──────┴──────┴────┴─────────────────────┴──────────┴───────────┴───────────┴─────────────┴─────────────┴────────────┴───────────┴─────────────┴───────────┴────────┴────────┴─────────────────┴─────────────────────┴─────────────────┴─────────┴──────────┴─────┴────────────┴────────────┴────────────────┴─────────────┴────────────────┴──────────────┴────────────────┴───────────┴───────────┴───────────┴─────────┴────────────────┴───────────────────────┴───────────┴─────────┴───────┴────────┴────────────────┴────────────────┴────────────┴───────────┴─────────────┴────────┴────────┴─────┴────────┴────────┴────────────┴──────────────────┴──────────────────┴────────────────┴────────────┴─────────────┴────────────┴───────────────────────┴──────────────────┴────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴────────────────────────┴──────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┘

1 row in set. Elapsed: 36.318 sec.

DESCRIBE TABLE gh15minpos

Query id: 6ba6aa08-89ad-4c02-9a0e-e53e0b5ea2fb

┌─name──────────────────┬─type──────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ GRID_ID               │ String            │              │                    │ gridId  │                  │                │
│ CALL_ID               │ UInt64            │              │                    │         │                  │                │
│ SEQ_ID                │ Nullable(UInt64)  │              │                    │         │                  │                │
│ DATE_TIME             │ String            │              │                    │         │                  │                │
│ DATE_TIME_MS          │ UInt32            │              │                    │         │                  │                │
│ EVENT_ID              │ Nullable(UInt32)  │              │                    │         │                  │                │
│ EUTRABAND             │ Nullable(UInt32)  │              │                    │         │                  │                │
│ EARFCN                │ Nullable(UInt32)  │              │                    │         │                  │                │
│ ENODEB_ID             │ Nullable(UInt64)  │              │                    │         │                  │                │
│ CELL_ID               │ Nullable(UInt64)  │              │                    │         │                  │                │
│ CELL_NAME             │ Nullable(String)  │              │                    │         │                  │                │
│ RSRP                  │ Nullable(Float64) │              │                    │         │                  │                │
│ RSRQ                  │ Nullable(Float64) │              │                    │         │                  │                │
│ TA                    │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TILE_ID               │ Nullable(String)  │              │                    │         │                  │                │
│ DURATION              │ Nullable(Float64) │              │                    │         │                  │                │
│ UL_VOLUME             │ Nullable(Float64) │              │                    │         │                  │                │
│ DL_VOLUME             │ Nullable(Float64) │              │                    │         │                  │                │
│ UL_THROUPUT           │ Nullable(Float64) │              │                    │         │                  │                │
│ DL_THROUPUT           │ Nullable(Float64) │              │                    │         │                  │                │
│ DATA_DATE             │ Date              │              │                    │         │                  │                │
│ DATA_HOUR             │ UInt32            │              │                    │         │                  │                │
│ CALL_STATUS           │ Nullable(UInt32)  │              │                    │         │                  │                │
│ CALL_TYPE             │ Nullable(UInt32)  │              │                    │         │                  │                │
│ MOVING                │ Nullable(UInt32)  │              │                    │         │                  │                │
│ INDOOR                │ Nullable(UInt32)  │              │                    │         │                  │                │
│ IMEI                  │ Nullable(UInt64)  │              │                    │         │                  │                │
│ CAUSE                 │ Nullable(String)  │              │                    │         │                  │                │
│ IMSI                  │ UInt64            │              │                    │         │                  │                │
│ MAKE_ID               │ Nullable(UInt64)  │              │                    │         │                  │                │
│ MODEL_ID              │ Nullable(UInt64)  │              │                    │         │                  │                │
│ CQI                   │ Nullable(Float64) │              │                    │         │                  │                │
│ SINR_PUSCH            │ Nullable(Float64) │              │                    │         │                  │                │
│ SINR_PUCCH            │ Nullable(Float64) │              │                    │         │                  │                │
│ BEST_EUTRABAND        │ Nullable(UInt32)  │              │                    │         │                  │                │
│ BEST_EARFCN           │ Nullable(UInt64)  │              │                    │         │                  │                │
│ BEST_ENODEB_ID        │ Nullable(UInt64)  │              │                    │         │                  │                │
│ BEST_CELL_ID          │ Nullable(UInt64)  │              │                    │         │                  │                │
│ BEST_CELL_NAME        │ Nullable(String)  │              │                    │         │                  │                │
│ BEST_RSRP             │ Nullable(Float64) │              │                    │         │                  │                │
│ BEST_RSRQ             │ Nullable(Float64) │              │                    │         │                  │                │
│ OVERSHOOT             │ Nullable(UInt32)  │              │                    │         │                  │                │
│ SINR_DL               │ Nullable(Float64) │              │                    │         │                  │                │
│ RANK_INDICATOR        │ Nullable(UInt64)  │              │                    │         │                  │                │
│ RANK_INDICATOR_SAMPLE │ Nullable(UInt64)  │              │                    │         │                  │                │
│ LON                   │ Nullable(Float64) │              │                    │         │                  │                │
│ LAT                   │ Nullable(Float64) │              │                    │         │                  │                │
│ PU_ID                 │ UInt32            │              │                    │         │                  │                │
│ QCI_ID                │ Nullable(Float64) │              │                    │         │                  │                │
│ DL_TRAFFIC_DUR        │ Nullable(Float64) │              │                    │         │                  │                │
│ UL_TRAFFIC_DUR        │ Nullable(Float64) │              │                    │         │                  │                │
│ TILE_LON              │ Nullable(Float64) │              │                    │         │                  │                │
│ TILE_LAT              │ Nullable(Float64) │              │                    │         │                  │                │
│ MOVING_TYPE           │ Nullable(UInt32)  │              │                    │         │                  │                │
│ UL_MCS                │ Nullable(Float64) │              │                    │         │                  │                │
│ DL_MCS                │ Nullable(Float64) │              │                    │         │                  │                │
│ PHR                   │ Nullable(Float64) │              │                    │         │                  │                │
│ UL_PRB                │ Nullable(Float64) │              │                    │         │                  │                │
│ DL_PRB                │ Nullable(Float64) │              │                    │         │                  │                │
│ RESULT_IND            │ Nullable(Float64) │              │                    │         │                  │                │
│ FAILURE_CAUSE_ID      │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TARGET_ENODEB_ID      │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TARGET_CELL_ID        │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TARGET_PCI            │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TARGET_FREQ           │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TARGET_TAC            │ Nullable(UInt64)  │              │                    │         │                  │                │
│ TARGET_RSRP_RSCP_RSSI │ Nullable(Float64) │              │                    │         │                  │                │
│ TARGET_RSRQ_ECN0      │ Nullable(Float64) │              │                    │         │                  │                │
│ ZOOM_LEVEL            │ Nullable(UInt32)  │              │                    │         │                  │                │
│ GEOGRAPHYL1_NAME      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL1_CODE      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL2_NAME      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL2_CODE      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL3_NAME      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL3_CODE      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL4_NAME      │ String            │              │                    │         │                  │                │
│ GEOGRAPHYL4_CODE      │ String            │              │                    │         │                  │                │
│ GRID_BOUNDARY         │ Nullable(String)  │              │                    │         │                  │                │
│ ECGI                  │ Nullable(UInt64)  │              │                    │         │                  │                │
└───────────────────────┴───────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
sanjeev3d commented 2 months ago

And I have checked that zookeeper pod is restarted multiple time

kubectl logs -p zookeeper-0 -n ns-symplatform-ch | tail -n 20

    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2024-08-01 00:01:49,236 [myid:1] - INFO  [ListenerHandler-zookeeper-0.zookeepers.ns-symplatform-ch.svc.cluster.local/240b:c0e3:4111:53eb:713:2:0:76a2:3888:QuorumCnxManager$Listener$ListenerHandler@1070] - Received connection request from /240b:c0e3:4111:53eb:713:2:0:78dd:44962
2024-08-01 00:01:49,237 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1396] - Connection broken for id 5135603447297303924, my id = 1
java.io.IOException: Received packet with invalid packet: 1919509363
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1386)
2024-08-01 00:01:49,237 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1402] - Interrupting SendWorker thread from RecvWorker. sid: 5135603447297303924. myId: 1
2024-08-01 00:01:49,237 [myid:1] - WARN  [SendWorker:5135603447297303924:QuorumCnxManager$SendWorker@1282] - Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
    at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1447)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1271)
2024-08-01 00:01:49,238 [myid:1] - WARN  [SendWorker:5135603447297303924:QuorumCnxManager$SendWorker@1294] - Send worker leaving thread id 5135603447297303924 my id = 1
2024-08-01 00:01:49,321 [myid:1] - INFO  [NIOWorkerThread-1:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:35124
2024-08-01 00:01:49,383 [myid:1] - INFO  [NIOWorkerThread-3:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:35134
Slach commented 2 months ago

issue is not related to clickhouse-operator i'm guessing to apply LIMIT 1 you need to read whole big data part

look to EXPLAIN ESTIMATE select * from gh15minpos limit 1; https://clickhouse.com/docs/en/sql-reference/statements/explain#explain-estimate and system.query_log https://clickhouse.com/docs/en/operations/system-tables/query_log

according to zookeeper

root reason

024-08-01 00:01:49,236 [myid:1] - INFO  [ListenerHandler-zookeeper-0.zookeepers.ns-symplatform-ch.svc.cluster.local/240b:c0e3:4111:53eb:713:2:0:76a2:3888:QuorumCnxManager$Listener$ListenerHandler@1070] - Received connection request from /240b:c0e3:4111:53eb:713:2:0:78dd:44962
2024-08-01 00:01:49,237 [myid:1] - WARN  [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1396] - Connection broken for id 5135603447297303924, my id = 1
java.io.IOException: Received packet with invalid packet: 1919509363

check which pod have 240b:c0e3:4111:53eb:713:2:0:78dd

sanjeev3d commented 2 months ago

pod: 240b:c0e3:4111:53eb:713:2:0:78dd

is also restarted after this zookeeper restart and new pod come up with new pod ip Pod is one of replica

sanjeev3d commented 2 months ago

Also getting one error again n again

2024-08-01 05:00:58,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:36514:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:08,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:38870:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:18,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:37098:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:28,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:34634:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:38,515 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:41634:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856

Is this because of large data set pushed

sanjeev3d commented 2 months ago

EXPLAIN ESTIMATE SELECT * FROM gh15minpos LIMIT 1

Query id: a2aadeb3-ea5a-4824-96ff-600bbbfd9b83

┌─database─────┬─table────────────┬─parts─┬───rows─┬─marks─┐
│ groundhog_rc │ gh15minpos_local │ 67033 │ 106062 │ 67033 │
└──────────────┴──────────────────┴───────┴────────┴───────┘

1 row in set. Elapsed: 9.187 sec.
Slach commented 2 months ago
> ┌─database─────┬─table────────────┬─parts─┬───rows─┬─marks─┐
> │ groundhog_rc │ gh15minpos_local │ 67033 │ 106062 │ 67033 │

looks like your table have wrong PARTITION BY and requires to read all parts

Slach commented 2 months ago

Is this because of large data set pushed

check https://github.com/Altinity/clickhouse-operator/blob/master/deploy/zookeeper/zookeeper-manually/quick-start-persistent-volume/zookeeper-3-nodes.yaml#L166C94-L166C119

is your Jute buffer is enough?

sanjeev3d commented 2 months ago

No, -Djute.maxbuffer is not set in zoo,cfg.

What should be recommended for large data size approx 30-40 lakh records

Slach commented 2 months ago

look links above and apply this is max value, not pre-allocation

buffer size not depends on rows it depends on how many data parts in your replicated tables