TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

Prepare a SOP for making manual backups of the Redis graphs #160

Closed gaurav closed 1 year ago

gaurav commented 1 year ago

We've now seen several instances of a single Redis instance getting corrupted (e.g. #159), forcing us to delete all six Redis tables and reloading all of them from scratch. One way to avoid this situation would be to back up all six Redis tables to disk and copy them over to Hatteras. That way, if we have a failure in both the primary and backup RENCI NodeNorm like we did on 2023-Jan-20, we will be able to restore the Redis instances from those backups rather than having to reload from the Babel files.

@YaphetKG also suggested that the problem might be that the Redis instances aren't writing their databases to disk properly -- if so, then backing them up might also cause the Redis instance to flush its contents to disk. Furthermore, we only need the Redis instances to be writeable while the loader is running -- once that's complete, we would prefer to put all the Redis instances into read-only mode somehow.

In the future, it might also be more efficient to set up the Redis instances on ITRB by transmitting the RDB files rather than our current strategy of starting jobs on ITRB to download Babel files from RENCI and load them into ITRB.

Steps needed:

gaurav commented 1 year ago

If we want to keep using the RDB file, we would need to set up something like this:

gaurav commented 1 year ago

This has been fixed in https://github.com/helxplatform/translator-devops/pull/651. Closing.