FreeUKGen / Systemwide

repository for issues that affect all systems within the Free UK Gen portfolio
0 stars 0 forks source link

Backup of MongoDB #221

Closed dougkdev closed 1 year ago

dougkdev commented 7 years ago

Rake task to write all of the mongo database collections to json files that are stored in the raid folder. See comment by @Captainkirkdawson on August 4 2021.

Previous description: see similar freereg issue 1262. We need to set up FreeCen to backup critical collections required to enable database to be rebuilt from files if database is ever corrupted / lost. Census data is already available for build-from-cold from the freecen1 PARMS.dat and .VLD files, but need to backup users, counties/syndicates/etc, data added directly to FC2, and any other data that doesn't come from FC1 that is needed to rebuild the database and rebuild search records.

richpomfret commented 7 years ago

@lemon-ukgen @dougkdev Are we in a position to move forward with this task? If not, what's needed?

lemon-ukgen commented 7 years ago

If all the data is under /raid on colobus then nothing further to do. If there's data elsewhere then someone needs to let me know where.

The current backup retention is 2 months of daily snapshots. If that's not enough then we can change it, but UKGEN will need a bigger storage server (currently uakari).

PatReynolds commented 7 years ago

@dougkdev is there any data elsewhere?

dougkdev commented 7 years ago

Yes @richpomfret, can move forward with this task as there is nothing it is dependent upon. @PatReynolds yes-- the data is currently exclusively in the database itself, not in the /raid directory. All of the things I mentioned in the initial comment need to be regularly backed up from the mongo database to the /raid directory on colobus so that Lemon's system backups will include them. FC2 does not currently write any of that information to /raid where the system backups can find it. This is a task for @Vino-S , not @lemon-ukgen .

Captainkirkdawson commented 3 years ago

I see no reason for this task. The collections exist in 3 different locations.

PatReynolds commented 3 years ago

@PatReynolds to talk to @lemon about risks

PatReynolds commented 3 years ago

Emailed Lemon (and Vino) 14/7/2021

PatReynolds commented 3 years ago

Chased 28 July 2021

PatReynolds commented 3 years ago

Lemon had summarised (I missed this email - have edited Lemon's words a little for clarity here eg "I" becomes "he": There are two tasks: 1/ Keeping a backup of the MongoDB databases around to recover "quickly" to a point-in-time 2/ Keeping a backup of all the data used to construct the MongoDB database

a All the data in MongoDB is, in theory, rebuildable from the transcriber and some other data b. If that data's on colobus:/raid/free* servers, then that's already backed up, to Hetzner and Backblaze c. Above, Doug suggests it may not be, in which case it's a development issue to address d. Also above, Kirk indicates there's no need to back up the built database (ie: don't do action 1/ above)

Lemon has already undertaken 1/ above as during the migration to the new MongoDB servers he got the impression that rebuilding from scratch was taking "too long", so having a constructed data around would be useful.

but

that doesn't mean we shouldn't do 2/ as well. We really want to make sure we have all the data used to generate our online databases, rather than only the result.

Lemon quoted "quickly" and "too long" - because he don't really know what recovery time the project would consider reasonable in light of catastrophic failure beyond "as fast as possible".

Lemon further says: . It needs developer input to a) identify the data used to construct the MongoDB databases and b) making sure that data lives in the part of the system we back up [Pat says ie Colobus, I think that means]

lemon-ukgen commented 3 years ago

Yes, that's right. Specifically, /raid on colobus. Above, we didn't think it was for FreeCEN2, at least.

Captainkirkdawson commented 3 years ago

All files required are now on colobus. They were on brazza while we were doing the monthly update from FC1 due to better disk performance. Since we now do online processing the file base was moved to colobus.

lemon-ukgen commented 3 years ago

@Captainkirkdawson please can you indicate where on colobus this data lives? Some example paths would be useful.

Some paths are on /raid (which is backed up) by merit of symlinks, so it's not immediately obvious. In general, ~apache/hosts/... is not on /raid, and is not backed up.

Captainkirkdawson commented 3 years ago

/raid/freecen2/freecen1/pieces/ is the most important /raid/freecen2/users/ is next

lemon-ukgen commented 3 years ago

Thanks. For FreeCEN2 that's job done then.

If FreeREG2 keeps everything it needs backed up in /raid/freereg2 then that's all good for MongoDB in general too.

Captainkirkdawson commented 3 years ago

/raid/freereg2/users/ is the critical file for reg

Captainkirkdawson commented 3 years ago

What we could do to avoid a full rebuild from scratch is to once a month or once a quarter writing all of the database collections to json files that are stored in the raid folder. These are what I used to make the transition from shard to replicated. There is a script used to make the copies.

PatReynolds commented 3 years ago

A rake task needs to be written (will run automatically)

PatReynolds commented 3 years ago

@Vino-S could you put an estimate on this, please?

richpomfret commented 2 years ago

@Vino-S can we have an estimate?

DeniseColbert commented 2 years ago

Vino is still working on this, needs testing then deploying and then testing again.

Vino-S commented 2 years ago

Code changes are deployed to production. Keeping it open : I need to monitor first backup

DeniseColbert commented 1 year ago

To be delpoyed to live by @Vino-S

DeniseColbert commented 1 year ago

Done, closing