Closed dougkdev closed 1 year ago
@lemon-ukgen @dougkdev Are we in a position to move forward with this task? If not, what's needed?
If all the data is under /raid
on colobus
then nothing further to do. If there's data elsewhere then someone needs to let me know where.
The current backup retention is 2 months of daily snapshots. If that's not enough then we can change it, but UKGEN will need a bigger storage server (currently uakari
).
@dougkdev is there any data elsewhere?
Yes @richpomfret, can move forward with this task as there is nothing it is dependent upon. @PatReynolds yes-- the data is currently exclusively in the database itself, not in the /raid directory. All of the things I mentioned in the initial comment need to be regularly backed up from the mongo database to the /raid directory on colobus so that Lemon's system backups will include them. FC2 does not currently write any of that information to /raid where the system backups can find it. This is a task for @Vino-S , not @lemon-ukgen .
I see no reason for this task. The collections exist in 3 different locations.
@PatReynolds to talk to @lemon about risks
Emailed Lemon (and Vino) 14/7/2021
Chased 28 July 2021
Lemon had summarised (I missed this email - have edited Lemon's words a little for clarity here eg "I" becomes "he": There are two tasks: 1/ Keeping a backup of the MongoDB databases around to recover "quickly" to a point-in-time 2/ Keeping a backup of all the data used to construct the MongoDB database
a All the data in MongoDB is, in theory, rebuildable from the transcriber and some other data b. If that data's on colobus:/raid/free* servers, then that's already backed up, to Hetzner and Backblaze c. Above, Doug suggests it may not be, in which case it's a development issue to address d. Also above, Kirk indicates there's no need to back up the built database (ie: don't do action 1/ above)
Lemon has already undertaken 1/ above as during the migration to the new MongoDB servers he got the impression that rebuilding from scratch was taking "too long", so having a constructed data around would be useful.
but
that doesn't mean we shouldn't do 2/ as well. We really want to make sure we have all the data used to generate our online databases, rather than only the result.
Lemon quoted "quickly" and "too long" - because he don't really know what recovery time the project would consider reasonable in light of catastrophic failure beyond "as fast as possible".
Lemon further says: . It needs developer input to a) identify the data used to construct the MongoDB databases and b) making sure that data lives in the part of the system we back up [Pat says ie Colobus, I think that means]
Yes, that's right. Specifically, /raid
on colobus. Above, we didn't think it was for FreeCEN2, at least.
All files required are now on colobus. They were on brazza while we were doing the monthly update from FC1 due to better disk performance. Since we now do online processing the file base was moved to colobus.
@Captainkirkdawson please can you indicate where on colobus this data lives? Some example paths would be useful.
Some paths are on /raid
(which is backed up) by merit of symlinks, so it's not immediately obvious. In general, ~apache/hosts/...
is not on /raid
, and is not backed up.
/raid/freecen2/freecen1/pieces/ is the most important /raid/freecen2/users/ is next
Thanks. For FreeCEN2 that's job done then.
If FreeREG2 keeps everything it needs backed up in /raid/freereg2
then that's all good for MongoDB in general too.
/raid/freereg2/users/ is the critical file for reg
What we could do to avoid a full rebuild from scratch is to once a month or once a quarter writing all of the database collections to json files that are stored in the raid folder. These are what I used to make the transition from shard to replicated. There is a script used to make the copies.
A rake task needs to be written (will run automatically)
@Vino-S could you put an estimate on this, please?
@Vino-S can we have an estimate?
Vino is still working on this, needs testing then deploying and then testing again.
Code changes are deployed to production. Keeping it open : I need to monitor first backup
To be delpoyed to live by @Vino-S
Done, closing
Rake task to write all of the mongo database collections to json files that are stored in the raid folder. See comment by @Captainkirkdawson on August 4 2021.
Previous description: see similar freereg issue 1262. We need to set up FreeCen to backup critical collections required to enable database to be rebuilt from files if database is ever corrupted / lost. Census data is already available for build-from-cold from the freecen1 PARMS.dat and .VLD files, but need to backup users, counties/syndicates/etc, data added directly to FC2, and any other data that doesn't come from FC1 that is needed to rebuild the database and rebuild search records.