Open ansou-naboty opened 3 years ago
@ansou-naboty clickhouse-operator don't make replication data itself so, look like the issue is not related to clickhouse-operator but related to clickhouse itself
could you look to /var/log/clickhouse-server/clickhouse-server.err.log inside your related clickhouse server pods for time period which you share in your network dump?
I know that clickhouse-operator doesn't make replication data itself, but clickhouse itself do it, or should i post this issue on clickhouse github.
@ansou-naboty do you see anything related to your RST in /var/log/clickhouse-server/clickhouse-server.err.log
?
In /var/log/clickhouse-server/clickhouse-server.err.log, i see connection reset by peer while reading or writing one socket xxxxxx. With tcpdump it corresponds to TCP window full read buffer.
Hi my name is Ansou FALL and im working for opensee.io as a Devops engineer. I'm running a clickhouse operator in AKS(Azure Kubernetes Service). We are facing tricky issue issue on clickhouse when replicating data in shard between others replicas. Here is the configuration we have: clusterName: standard ClickHouseInstallationName: statefulset Shards: 8 Replicas: 2 Nodecounts: 16 Total of node: 16 nodes with 64 GB of memory ram and 16 CPU each node. We have 48 Instances and each of them insert 500K lines every 1-2 seconds.
During ingestion part the tcp_rmem buffer are full and the TCP connection between repication are closed.
The image shown below lists RST flag that close TCP connection.
This image describes the RST flag when replicating data between replicas in other shard.
This image lists the some TCP full windows during ingestion.