Closed air3ijai closed 1 year ago
@air3ijai Could you provide me the results of sending PING command via keydb-cli during the scaling process:
while true; do
reply=$(keydb-cli -h keydb-scaling.default.svc.cluster.local -p 6379 PING)
echo "`date +"%T.%3N"` - $reply"
sleep 0.2
done
If there is some response other than PONG I could improve the readiness probe.
We performed a slightly different test, but it probably cover an initial concern as well
PING
, GET test
, INFO KEYSPACE
None
Script
while true; do
reply=$(keydb-cli -h keydb-scaling.default.svc.cluster.local -p 6379 <<<$'PING\nGET test\nINFO KEYSPACE' 2>/dev/null)
echo "`date +"%T.%3N"` - $reply"
sleep 0.1
done
Results
11:48:11.628 - db0:keys=1994658 - test value
11:48:11.791 - db0:keys=470738 - LOADING KeyDB is loading the dataset in memory
11:48:11.919 - db0:keys=1994658 - test value
11:48:12.047 - db0:keys=1994658 - test value
11:48:12.200 - db0:keys=470738 - LOADING KeyDB is loading the dataset in memory
11:48:12.354 - db0:keys=470738 - LOADING KeyDB is loading the dataset in memory
11:48:12.481 - db0:keys=1994658 - test value
11:48:14.532 - db0:keys=474243 - LOADING KeyDB is loading the dataset in memory
Conclusions
Here are default readiness:
#!/bin/bash
set -e
[[ -n "${REDIS_PASSWORD}" ]] && export REDISCLI_AUTH="${REDIS_PASSWORD}"
response="$(
timeout -s 3 "${1}" \
keydb-cli \
-h localhost \
-p "${REDIS_PORT}" \
ping
)"
if [ "${response}" != "PONG" ]; then
echo "${response}"
exit 1
fi
and liveness:
#!/bin/bash
set -e
[[ -n "${REDIS_PASSWORD}" ]] && export REDISCLI_AUTH="${REDIS_PASSWORD}"
response="$(
timeout -s 3 "${1}" \
keydb-cli \
-h localhost \
-p "${REDIS_PORT}" \
ping
)"
if [ "${response}" != "PONG" ] && [[ ! "${response}" =~ ^.*LOADING.*$ ]]; then
echo "${response}"
exit 1
fi
scripts for probes. As you can see only for liveness probe LOADING ...
response is valid and for readiness it's not. So when KeyDB is loading data it should be not ready and the service should not route the requests on it. Are you sure you use latest chart version and use default startup, liveness and readiness probes?
The only thing which is suspicious to me is the default success and failure thresholds. Will check that and will try to reproduce your issue.
Are you sure you use latest chart version and use default startup, liveness and readiness probes?
As I see in the installation manifest we didn't touch probes and use the latest chart version
helm list -n default | grep keydb
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
keydb default 3 2022-12-27 22:51:47.86499 +0200 EET deployed keydb-0.43.1 6.3.1
kubects describe statefulset -n default keydb
Pod Template:
Labels: app.kubernetes.io/instance=keydb
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=keydb
app.kubernetes.io/version=6.3.1
helm.sh/chart=keydb-0.43.1
Liveness: exec [sh -c /health/ping_liveness_local.sh 5] delay=20s timeout=6s period=5s #success=1 #failure=5
Readiness: exec [sh -c /health/ping_readiness_local.sh 1] delay=20s timeout=2s period=5s #success=1 #failure=5
Startup: exec [sh -c /health/ping_readiness_local.sh 1] delay=0s timeout=2s period=5s #success=1 #failure=24
And default values - https://github.com/Enapter/charts/blob/master/keydb/values.yaml#L98-L132
# Liveness Probe
livenessProbe:
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
# Readiness Probe
readinessProbe:
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
# Startup Probe
startupProbe:
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 24
@Antiarchitect, I observed the following in liveness
probe
PING
during load we receive a reply
PONG
GET test
during load we recieve a reply
(error) LOADING KeyDB is loading the dataset in memory
So, the probe like PING
does not help us to identify the issue with LOADING
and we should perform a kind of GET 'KEY'
# Loading state
keydb-cli get non-existing-key
(error) LOADING KeyDB is loading the dataset in memory
echo $?
0
# Ready state
keydb-cli get non-existing-key
(nil)
echo $?
0
I see. Sorry for the delay. I will check this when I have time :) Do you test on latest KeyDB 6.3.1 and could you please check is the behavior of the PING the same on 6.2.2 too?
@Antiarchitect, another simple test using Docker Compose with v6.3.1
and v6.2.2
.
We can see, that both versions have similar behavior and we should consider that in our probes.
version: '3'
services:
keydb:
image: eqalpha/keydb:x86_64_v6.3.1
# image: eqalpha/keydb:x86_64_v6.2.2
container_name: keydb
ports:
- 6379:6379
volumes:
- ./data:/data
keydb-cli config set save ""
for key in {1..10}; do
keydb-cli DEBUG POPULATE 10000 "test${key}" 100000
done
keydb-cli bgsave
while true; do
reply=$(timeout 1 keydb-cli <<<$'PING\nGET test')
echo "`date +"%T.%3N"` - $reply"
sleep 1
done
docker-compose down && docker-compose up -d
19:47:38.276 - PONG
19:47:39.292 - PONG
19:47:40.310 - PONG
19:47:41.325 - PONG
19:47:42.340 - PONG
19:47:43.355 - PONG
19:47:45.363 -
19:47:47.370 -
19:47:49.377 -
19:47:51.384 -
Error: Connection reset by peer
19:47:52.395 -
19:47:54.402 -
19:47:56.409 - PONG
19:47:58.416 - PONG
19:48:00.413 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:02.370 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:04.296 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:06.219 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:08.140 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:10.061 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:11.986 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:13.909 - PONG
LOADING KeyDB is loading the dataset in memory
19:48:15.410 - PONG
19:48:16.426 - PONG
19:48:17.444 - PONG
19:48:18.459 - PONG
19:48:19.473 - PONG
19:48:20.489 - PONG
Good news everyone! At last I've recreated this setup and improved liveness and readiness probes based on the tests. So the main point is now readiness probe will try not to PING but GET random uuid key (non-existent key) and compare the result with "LOADING KeyDB is loading the dataset in memory" - which is loading response. Until loading string appears in the response the pod counts not ready. @air3ijai please test your case with new 0.44.0
keydb chart version. If the problem still appears I will reopen this issue again.
@mrsrvman PTAL. I know the original probes were implemented by you. Do you agree with the current solution?
P.S. I've looked inside the KeyDB sources and PING
command seems have allow-loading
internal flag so it does not report with loading string ever. So I cleaned it up.
Hello,
We just tried your helm chart and it works fine. And we also did some test to check how it will handle scaling up/down events.
Methodology
GET <KEY>
continuouslyTest script
Scale up
2 --> 3
Scale down
3 --> 2
We see that sometimes we get an empty reply and it is probably because data is not yet replicated and pod was already added to the service.
Is there anyway to improve that?
Thank you!