Closed gaurav closed 1 year ago
If we want to keep using the RDB file, we would need to set up something like this:
BGSAVE SCHEDULE
command.LASTSAVE
until the last save time changes.kubectl cp
to copy the /data/dump.rdb file into Hatteras somewhere.redis-check-rdb
to make sure the copied RDB file is valid.This has been fixed in https://github.com/helxplatform/translator-devops/pull/651. Closing.
We've now seen several instances of a single Redis instance getting corrupted (e.g. #159), forcing us to delete all six Redis tables and reloading all of them from scratch. One way to avoid this situation would be to back up all six Redis tables to disk and copy them over to Hatteras. That way, if we have a failure in both the primary and backup RENCI NodeNorm like we did on 2023-Jan-20, we will be able to restore the Redis instances from those backups rather than having to reload from the Babel files.
@YaphetKG also suggested that the problem might be that the Redis instances aren't writing their databases to disk properly -- if so, then backing them up might also cause the Redis instance to flush its contents to disk. Furthermore, we only need the Redis instances to be writeable while the loader is running -- once that's complete, we would prefer to put all the Redis instances into read-only mode somehow.
In the future, it might also be more efficient to set up the Redis instances on ITRB by transmitting the RDB files rather than our current strategy of starting jobs on ITRB to download Babel files from RENCI and load them into ITRB.
Steps needed:
redis-check-rdb
?