datasektionen / infra

MIT License
1 stars 0 forks source link

Automatic backups #4

Closed RafDevX closed 2 weeks ago

foodelevator commented 3 weeks ago

Maybe something like this would work (taken from my personal config and modified slightly for s3 and agenix but not tested at all): modules/restic.nix

{ config, lib, secretsDir, ... }:
{
  options.dsekt.backup-paths = lib.mkOption { type = with lib.types; listOf str; };
  config = lib.mkIf (builtins.length config.dsekt.backup-paths != 0) {
    services.restic.backups.s3 = {
      initialize = true;
      paths = config.dsekt.backup-paths;
      timerConfig = {
        OnCalendar = "04:00";
        Persistent = true;
      };
      repository = "s3:https://s3.amazonaws.com/dsekt-backups-${config.networking.hostName}";
      environmentFile = config.age.secrets.nomad-gossip-key.path;
      pruneOpts = [
        "--keep-daily 7"
        "--keep-weekly 5"
        "--keep-monthly 12"
      ];
    };

    age.secrets.restic-aws-creds.file = secretsDir + "/restic-aws-creds-${config.networking.hostName}.env.age";
  };
}

profiles/postgres.nix

{ ... }:
{
  services.postgresql = ...;
  dsekt.backup-paths = config.services.postgresql.dataDir;
}

if we want to back up the postgresql data directory directly. What the old servers currently do is to use pg_dump -Fc and move that file to s3. Not sure what's best

RafDevX commented 2 weeks ago

From the docs I would say that pg_dumpall has advantages over a filesystem backup for our use case, especially since filesystem backups are useless (read: potentially corrupted) if the engine is not shut down during the process.

However, I'm not sure how much Restic is able to optimize backups of the same file, i.e. if it employs some method of incremental diffs or if its best granularity is at the file level, meaning that it'd just replace it the dump.sql file every time without any optimizations. I need to look into it.

foodelevator commented 2 weeks ago

Databases rarely get that big so I think it will be absolutely fine without any incremental diffing (which I doubt restic has, but I haven't checked). With pg_dump you can use -Fd which "will create a directory with one file for each table and large object being dumped", which I guess could be nice. I didn't find anything about output format in man pg_dumpall, though I guess we could take inspiration from this, maybe slightly cursed solution, to achieve that.