cleanupTempfiles.minutes - default value

Hello,

We just did a test how the Pod will handle multiple restarts during backups.

At some point there maybe snapshot creation started and interrupted

As a result we may have a temporary backup file, which was not finished

drwxr-xr-x. 1 root root          56 Jan  4 11:04 ..
-rw-r--r--. 1 root root 20547669028 Jan  4 10:02 dump.rdb
-rw-r--r--. 1 root root  5188599808 Jan  4 11:03 temp-1-3.rdb
-rw-r--r--. 1 root root  1432674655 Jan  4 10:46 temp-1-9.rdb
-rw-r--r--. 1 root root  1078273848 Jan  4 10:46 temp-2086607563.1.rdb
-rw-r--r--. 1 root root           0 Jan  4 11:06 temp-2088105784.1.rdb

At the next start KeyDB will load the data and then start to sync from the Master
After the sync it will perform a new backup
This backup can be interrupted and as a result we may have one more temp file.

Doing this in a loop, we may running out of disk space. It is for sure a corner case.

Current value for the cleanupTempfiles.minutes is 60 minutes and it will not delete all precedent crashes happened just some minutes ago.

What is the main reason to have such a big value?

For Bitnami Redis Chart we use the following

master:
  preExecCmds: "rm -rf /data/temp*.*"

So, we will delete all temporary files right before the Redis start.

Got the issue today on the Dev after a lot of issues we experienced yesterday in the Kubernetes cluster

Dumps

drwxr-xr-x. 2 root root         102 Jan 11 15:38 .
drwxr-xr-x. 1 root root          56 Jan 10 14:53 ..
-rw-r--r--. 1 root root 20553313533 Jan 10 07:13 dump.rdb
-rw-r--r--. 1 root root  2190929920 Jan 10 07:43 temp--1701050988.1.rdb
-rw-r--r--. 1 root root  1806061732 Jan 11 15:38 temp-324797-0.rdb
-rw-r--r--. 1 root root  2700424307 Jan 10 07:43 temp-652292-0.rdb

Save error loop

1:319:S 11 Jan 2023 15:35:47.933 * Replica 192.168.10.10:6379 asks for synchronization
1:319:S 11 Jan 2023 15:35:47.933 * Full resync requested by replica 192.168.10.10:6379
1:319:S 11 Jan 2023 15:35:47.933 * Starting BGSAVE for SYNC with target: disk
1:319:S 11 Jan 2023 15:35:48.105 * Background saving started by pid 324179
1:319:S 11 Jan 2023 15:35:48.105 * Background saving started
324179:319:C 11 Jan 2023 15:38:35.454 # Write error saving DB on disk: No space left on device
1:319:S 11 Jan 2023 15:38:36.601 # Background saving error
1:319:S 11 Jan 2023 15:38:36.601 # SYNC failed. BGSAVE child returned an error
1:319:S 11 Jan 2023 15:38:36.601 # Connection with replica 192.168.10.10:6379 lost.

1:319:S 11 Jan 2023 15:38:36.783 * Replica 192.168.10.10:6379 asks for synchronization
1:319:S 11 Jan 2023 15:38:36.783 * Full resync requested by replica 192.168.10.10:6379
1:319:S 11 Jan 2023 15:38:36.783 * Starting BGSAVE for SYNC with target: disk
1:319:S 11 Jan 2023 15:38:36.956 * Background saving started by pid 324797
1:319:S 11 Jan 2023 15:38:36.956 * Background saving started
324797:319:C 11 Jan 2023 15:41:19.609 # Write error saving DB on disk: No space left on device
1:319:S 11 Jan 2023 15:41:20.887 # Background saving error
1:319:S 11 Jan 2023 15:41:20.887 # SYNC failed. BGSAVE child returned an error
1:319:S 11 Jan 2023 15:41:20.887 # Connection with replica 192.168.10.10:6379 lost.

Enapter / charts

cleanupTempfiles.minutes - default value #61