OT-CONTAINER-KIT / helm-charts

A repository which that will contain helm charts with best and security practices.
https://ot-container-kit.github.io/helm-charts
49 stars 84 forks source link

Master node get killed when doing redis-benchmark without proper logs. #122

Open sv6375261073 opened 1 year ago

sv6375261073 commented 1 year ago
-> Redis-cluster version: 0.15.0
-> Master/Slave resource
  Request: 
    cpu: 1
    memory: 1Gi
  Limit:
    cpu 2
    Memory: 10Gi

Hi Team,

When we are doing redis-benchmarking master node automatically restarts with killed log. With this we are not properly able to figure out the exact issue of restart.

################### LOG OF RESTARTED MASTER POD ##################

kubectl logs redis-cluster-follower-2 -n ot-operators -p ──(Wed,Jul12)─┘ E0712 19:20:23.945129 76790 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0712 19:20:25.084599 76790 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0712 19:20:25.349431 76790 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request Defaulted container "redis-cluster-follower" out of: redis-cluster-follower, redis-exporter Running without TLS mode Starting redis service in cluster mode..... 10:C 12 Jul 2023 13:46:27.237 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 10:C 12 Jul 2023 13:46:27.237 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=10, just started 10:C 12 Jul 2023 13:46:27.237 # Configuration loaded 10:M 12 Jul 2023 13:46:27.238 monotonic clock: POSIX clock_gettime 10:M 12 Jul 2023 13:46:27.238 Node configuration loaded, I'm 5991471e7d5ee1526607badc6a9164eb304546b0 10:M 12 Jul 2023 13:46:27.238 Running mode=cluster, port=6379. 10:M 12 Jul 2023 13:46:27.238 # Server initialized 10:M 12 Jul 2023 13:46:27.240 Loading RDB produced by version 7.0.5 10:M 12 Jul 2023 13:46:27.240 RDB age 88407 seconds 10:M 12 Jul 2023 13:46:27.240 RDB memory usage when created 1.65 Mb 10:M 12 Jul 2023 13:46:27.240 Done loading RDB, keys loaded: 0, keys expired: 0. 10:M 12 Jul 2023 13:46:27.240 DB loaded from disk: 0.000 seconds 10:M 12 Jul 2023 13:46:27.240 Ready to accept connections 10:M 12 Jul 2023 13:46:27.246 Replica 10.2.206.97:6379 asks for synchronization 10:M 12 Jul 2023 13:46:27.246 Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'abc43f577a7ba63f4f52f602139fc2bb2b5b81a2', my replication IDs are '689a30c17b12760d95e4fd33f19cfe0d743290d3' and '846e7597232a975fd8d427100919ba33d27c7ef5') 10:M 12 Jul 2023 13:46:27.246 Delay next BGSAVE for diskless SYNC 10:M 12 Jul 2023 13:46:28.315 # Address updated for node 0d13dfe1abcc1c4b33ebf1d96e16b6d230d32609, now 10.2.173.117:6379 10:M 12 Jul 2023 13:46:29.245 # Cluster state changed: ok 10:M 12 Jul 2023 13:46:29.566 # Address updated for node 5c2a062721b4e41e76929747955ba62fc3f01cab, now 10.2.189.11:6379 10:M 12 Jul 2023 13:46:32.255 Starting BGSAVE for SYNC with target: replicas sockets 10:M 12 Jul 2023 13:46:32.255 Background RDB transfer started by pid 23 23:C 12 Jul 2023 13:46:32.256 Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB 10:M 12 Jul 2023 13:46:32.256 # Diskless rdb transfer, done reading from pipe, 1 replicas still up. 10:M 12 Jul 2023 13:46:32.260 Background RDB transfer terminated with success 10:M 12 Jul 2023 13:46:32.260 Streamed RDB transfer with replica 10.2.206.97:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming 10:M 12 Jul 2023 13:46:32.260 Synchronization with replica 10.2.206.97:6379 succeeded /usr/bin/entrypoint.sh: line 91: 10 Killed redis-server /etc/redis/redis.conf

################## CONNECTION SIDE LOG WITH COMMAND OF REDIS-BENCHMARK#############

root@ubuntu-deployment-5474b4864f-hg8zq:/# redis-benchmark -h redis-cluster-leader.ot-operators.svc  -p 6379 -a password -t get,set,lpush -c 1000 -n 3000000 -r 1000000 -d 102400 --cluster -l
Cluster has 3 master nodes:

Master 0: 2ded0507d646d251a338f1f0e0c63f9fd751a943 10.2.134.141:6379
Master 1: 5c2a062721b4e41e76929747955ba62fc3f01cab redis-cluster-leader.ot-operators.svc:6379
Master 2: 5991471e7d5ee1526607badc6a9164eb304546b0 10.2.196.248:6379

Error: Connection reset by peer
sv6375261073 commented 1 year ago

@shubham-cmyk ,

I am facing this issue in 0.15.3 version also. while doing benchmark pod crashes but not getting the exact point of failure for it.

Any update here??

sv6375261073 commented 1 year ago

Chart Version : 0.15.3

Benchmarking from running pod in the cluster: `#Redis-benchmark installation apt update && apt install -y redis

` root@ubuntu-deployment-5474b4864f-h75zv:/#

########################### REDIS BENCHMARKING ###############
COUNT=0
while [ $COUNT -lt 20 ]; 
do 
    echo "################ ITERATION : $COUNT ##############"; 
    redis-benchmark -h redis-cluster-leader.ot-operators.svc  -p 6379 -a password --cluster -t get,set,ping,sadd,hmset,incr,lpush -c 1000 -n 3000000 -r 1000000 -d 102400; 
    ((COUNT=COUNT+1))
done

############ MASTER POD CRASHED LOG #################

└─(17:51:53 on main ✹ ✭)──> kubectl logs redis-cluster-follower-0 -n ot-operators -p 18:15:35.394017 60813 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0718 18:15:36.297639 60813 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0718 18:15:36.372458 60813 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request Defaulted container "redis-cluster-follower" out of: redis-cluster-follower, redis-exporter Running without TLS mode Starting redis service in cluster mode..... 10:C 18 Jul 2023 12:42:35.244 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 10:C 18 Jul 2023 12:42:35.244 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=10, just started 10:C 18 Jul 2023 12:42:35.244 # Configuration loaded 10:M 18 Jul 2023 12:42:35.244 monotonic clock: POSIX clock_gettime 10:M 18 Jul 2023 12:42:35.245 Node configuration loaded, I'm a6869e3a644f3c84a890b58cb06919c17d956f3f 10:M 18 Jul 2023 12:42:35.245 Running mode=cluster, port=6379. 10:M 18 Jul 2023 12:42:35.245 # Server initialized 10:M 18 Jul 2023 12:42:35.246 Loading RDB produced by version 7.0.5 10:M 18 Jul 2023 12:42:35.246 RDB age 651 seconds 10:M 18 Jul 2023 12:42:35.246 RDB memory usage when created 1.93 Mb 10:M 18 Jul 2023 12:42:35.246 Done loading RDB, keys loaded: 0, keys expired: 0. 10:M 18 Jul 2023 12:42:35.246 DB loaded from disk: 0.001 seconds 10:M 18 Jul 2023 12:42:35.246 Ready to accept connections 10:M 18 Jul 2023 12:42:35.252 Replica 10.2.171.188:6379 asks for synchronization 10:M 18 Jul 2023 12:42:35.252 Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '9a64d9bd51f3730a510ee72590d4ab96b24391c1', my replication IDs are 'f36924802bcdb635e523c9da46d2761e0bebc7ba' and '4bf3e69ef1ab4ae8ca18c86b073f6f9157cb99aa') 10:M 18 Jul 2023 12:42:35.252 Delay next BGSAVE for diskless SYNC 10:M 18 Jul 2023 12:42:37.253 # Cluster state changed: ok 10:M 18 Jul 2023 12:42:40.263 Starting BGSAVE for SYNC with target: replicas sockets 10:M 18 Jul 2023 12:42:40.263 Background RDB transfer started by pid 23 23:C 18 Jul 2023 12:42:40.264 Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB 10:M 18 Jul 2023 12:42:40.264 # Diskless rdb transfer, done reading from pipe, 1 replicas still up. 10:M 18 Jul 2023 12:42:40.268 Background RDB transfer terminated with success 10:M 18 Jul 2023 12:42:40.268 Streamed RDB transfer with replica 10.2.171.188:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming 10:M 18 Jul 2023 12:42:40.268 Synchronization with replica 10.2.171.188:6379 succeeded 10:M 18 Jul 2023 12:44:27.128 Clear FAIL state for node fb8237edfda4f76fe6bfd2038a7898bb0fbb4597: replica is reachable again. 10:M 18 Jul 2023 12:44:36.657 10000 changes in 60 seconds. Saving... 10:M 18 Jul 2023 12:44:36.668 * Background saving started by pid 218 10:M 18 Jul 2023 12:44:37.177 # Client id=3 addr=10.2.171.188:37726 laddr=10.2.173.117:6379 fd=16 name= age=122 idle=0 flags=S db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=2625 omem=268490264 tot-mem=268512536 events=rw cmd=replconf user=default redir=-1 resp=2 scheduled to be closed ASAP for overcoming of output buffer limits. 10:M 18 Jul 2023 12:44:37.178 # Connection with replica 10.2.171.188:6379 lost. /usr/bin/entrypoint.sh: line 91: 10 Killed redis-server /etc/redis/redis.conf