Closed sanjeev3d closed 2 months ago
And I have checked that zookeeper pod is restarted multiple time
kubectl logs -p zookeeper-0 -n ns-symplatform-ch | tail -n 20
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2024-08-01 00:01:49,236 [myid:1] - INFO [ListenerHandler-zookeeper-0.zookeepers.ns-symplatform-ch.svc.cluster.local/240b:c0e3:4111:53eb:713:2:0:76a2:3888:QuorumCnxManager$Listener$ListenerHandler@1070] - Received connection request from /240b:c0e3:4111:53eb:713:2:0:78dd:44962
2024-08-01 00:01:49,237 [myid:1] - WARN [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1396] - Connection broken for id 5135603447297303924, my id = 1
java.io.IOException: Received packet with invalid packet: 1919509363
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1386)
2024-08-01 00:01:49,237 [myid:1] - WARN [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1402] - Interrupting SendWorker thread from RecvWorker. sid: 5135603447297303924. myId: 1
2024-08-01 00:01:49,237 [myid:1] - WARN [SendWorker:5135603447297303924:QuorumCnxManager$SendWorker@1282] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1447)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:98)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1271)
2024-08-01 00:01:49,238 [myid:1] - WARN [SendWorker:5135603447297303924:QuorumCnxManager$SendWorker@1294] - Send worker leaving thread id 5135603447297303924 my id = 1
2024-08-01 00:01:49,321 [myid:1] - INFO [NIOWorkerThread-1:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:35124
2024-08-01 00:01:49,383 [myid:1] - INFO [NIOWorkerThread-3:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:35134
issue is not related to clickhouse-operator i'm guessing to apply LIMIT 1 you need to read whole big data part
look to EXPLAIN ESTIMATE select * from gh15minpos limit 1;
https://clickhouse.com/docs/en/sql-reference/statements/explain#explain-estimate
and
system.query_log
https://clickhouse.com/docs/en/operations/system-tables/query_log
according to zookeeper
root reason
024-08-01 00:01:49,236 [myid:1] - INFO [ListenerHandler-zookeeper-0.zookeepers.ns-symplatform-ch.svc.cluster.local/240b:c0e3:4111:53eb:713:2:0:76a2:3888:QuorumCnxManager$Listener$ListenerHandler@1070] - Received connection request from /240b:c0e3:4111:53eb:713:2:0:78dd:44962
2024-08-01 00:01:49,237 [myid:1] - WARN [RecvWorker:5135603447297303924:QuorumCnxManager$RecvWorker@1396] - Connection broken for id 5135603447297303924, my id = 1
java.io.IOException: Received packet with invalid packet: 1919509363
check which pod have 240b:c0e3:4111:53eb:713:2:0:78dd
pod: 240b:c0e3:4111:53eb:713:2:0:78dd
is also restarted after this zookeeper restart and new pod come up with new pod ip Pod is one of replica
Also getting one error again n again
2024-08-01 05:00:58,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:36514:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:08,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:38870:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:18,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:37098:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:28,513 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:34634:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
2024-08-01 05:01:38,515 [myid:3] - ERROR [LearnerHandler-/240b:c0e3:4111:53eb:713:2:0:78dd:41634:LearnerHandler@714] - Unexpected exception causing shutdown while sock still open
java.io.IOException: Len error 1195725856
Is this because of large data set pushed
EXPLAIN ESTIMATE SELECT * FROM gh15minpos LIMIT 1
Query id: a2aadeb3-ea5a-4824-96ff-600bbbfd9b83
┌─database─────┬─table────────────┬─parts─┬───rows─┬─marks─┐
│ groundhog_rc │ gh15minpos_local │ 67033 │ 106062 │ 67033 │
└──────────────┴──────────────────┴───────┴────────┴───────┘
1 row in set. Elapsed: 9.187 sec.
> ┌─database─────┬─table────────────┬─parts─┬───rows─┬─marks─┐
> │ groundhog_rc │ gh15minpos_local │ 67033 │ 106062 │ 67033 │
looks like your table have wrong PARTITION BY and requires to read all parts
Is this because of large data set pushed
is your Jute buffer is enough?
No, -Djute.maxbuffer is not set in zoo,cfg.
What should be recommended for large data size approx 30-40 lakh records
look links above and apply this is max value, not pre-allocation
buffer size not depends on rows it depends on how many data parts in your replicated tables
I am experiencing performance issues with a query in ClickHouse that takes a long time to fetch single records.
Query:
select * from gh15minpos limit 1;
Output
DESCRIBE TABLE gh15minpos