Restarting the redis instances caused id-eq-id to become corrupted

gaurav commented 1 year ago

Here's the error message we get from that node:

I have no name!@nn-redis-2022dec2-id-eq-id-master-0:/$ redis-check-rdb /data/dump.rdb 
[offset 0] Checking RDB file /data/dump.rdb
--- RDB ERROR DETECTED ---
[offset 9] Wrong signature trying to load DB from file
[additional info] While doing: start
[additional info] Reading type 0 (string)
[info] 0 keys read
[info] 0 expires
[info] 0 already expired

@YaphetKG suggested that saving a backup copy of the RDB database might either prevent this from happening in the future or at least make it easier to restore after another crash.

Once we're past the Feb relay, we can bring up a redis instance on translator-exp and then deliberately crash it to see if we can replicate and stop this.

gaurav commented 1 year ago

This is because the PVC only had 100Gi, so once the backup reached 51Gi, the redis-r3-external could no longer save the backup correctly. I've increased that space in https://github.com/TranslatorSRI/NodeNormalization/issues/159 to see if that solves this problem.

gaurav commented 1 year ago

Increasing the disk space did fix that issue. Closing.

TranslatorSRI / NodeNormalization

Restarting the redis instances caused id-eq-id to become corrupted #159