MatrixAI / Polykey

Polykey Core Library
https://polykey.com
GNU General Public License v3.0
29 stars 4 forks source link

Backup & Restore & Export #288

Open CMCDragonkai opened 2 years ago

CMCDragonkai commented 2 years ago

Specification

Backup & Restore for PK.

PK currently doesn't have a backup and restore system. This is necessary to ensure safe usage of PK.

Currently users can synchronise PK vault state to other PK agents, but this only deals with vaults.

One can dogfood PK backup by keeping PK backups themselves in other PK vaults. Thus using PK as a "backup" system.

However PK vaults are not optimised for large files atm due to EFS issues, so this is not recommended yet.

Backup

  1. Backup to a single .zip archive, zip is better than tar because:
    • random access is possible
    • we don't actually need to keep the unix file attributes, file attributes are not relevant here
    • readable on windows more easily
    • pure JS implementation - https://github.com/gildas-lormeau/zip.js
  2. Backups are encrypted with password protection or by default using the root private key, note that I believe this involves using a symmetric key and then encrypting the data with that, and thene encrypting the symmetric key the root key, because asymmetric keys have data-size limits
  3. Backups should select which things to backup:
    • set of vaults
    • other data like acl, gg, notifications
    • configuration options
  4. Backups should be able to occur with or without the agent running
  5. Possibly new command pk backup
  6. Backups are streamed to a filesystem location specified by the user, but piped out to allow composition with other commands
    • later the ability to send to another PK
    • network backups can be automated in a decentralised way by sending to other PK nodes
    • more sophisticated network protocols could also be supported like s3... etc, enterprise features here matter too
  7. Backups should also be stored in ${nodePath}/backups to keep a track a list of backups that exist, this can be useful to help solve #287
  8. Backups zip archives will require a manifest file that keeps track of what is in the backup, and also the root public key used to back it up, and other metadata, if metadata is encrypted, this means they are not part of external information, if we want to make it public, we would need to nest an archive, keep a public JSON manifest file and then an internal encrypted zip archive.

Restore

  1. Restore must be safe and atomic where possible, this means they don't directly mutate the node state, but instead go to a temporary directory that gets renamed, see #287 for details
  2. Must be able to restore directly from somewhere in the filesystem, or from the persisted backups in the node path, which is important for #287
  3. Restore should work with bootstrap, so it's possible to bootstrap with a restored state, usually bootstrap assumes no state at all, but in this case one may want to bootstrap on a new computer with a restored state
  4. It is assumed that when restoring, you are restoring to the same PK agent, which means the same root key, however it is also possible to "restore" to a new PK agent with a new root key, but if the root key has changed, this might mean that certain DB states no longer make sense a like GG
  5. Should it be possible to restore-merge? As in merge to an existing state?
  6. It should be able to restore with or without the agent running
  7. Possibily new command pk restore
  8. Should be able to choose what to restore

Not sure if they should be part of a new domain of subcommands or top level commands. It seems restoring has complexities due to existing agent or completely new agent state, and whether the root key is preserved.

Additional context

Tasks

  1. [ ] - work out CLI command spec
  2. Integrate into #287
  3. ...
CMCDragonkai commented 2 years ago

I've been looking into encrypted file-format standards, as this will be important for any sort of backup, restore and even export functionality.

Suppose one has to export a PK file, or a set of files to a file on disk, but you want this file to be:

What would be a good file format for this? And something that we would want to natively support in PK instead of expecting users to use PGP or openssl?

Well I had a look at these links:

And here are my conclusions:

  1. There's only 2 file formats in general use for encrypted data: S/MIME and OpenPGP Message Format
  2. S/MIME is an old format designed for emails, but it is capable of encrypting by public key or password, for binary or text files, however openssl seems to choke on large files atm, but seems like it isn't a problem of the standard, but of implementation.
  3. OpenPGP Message Format seems pretty solid, and it appears that should be possible to use our X.509 key to encrypt it, but I'm not entirely sure how this plays out with #168, and that would mean bringing back the openpgp standard into PK
  4. There's a third format PKCS#12, the standard supports arbitrary data, but no tool provides the ability to use it as a standard archive and nobody really talks about it.
  5. Finally there are some standards: age for encryption and minisign/signify for signing, this seems like a viable approach
    • age does not provide archive ability, expecting one to choose the relevant archive format you want like tar or zip prior, and therefore compression would also be done prior to any sort of encryption, and of course signature would end up becoming a separate file

I like the idea of simply exporting a PGP file, because it's quite portable, whereas age or minisign would not be as well known.

CMCDragonkai commented 2 years ago

Need to point out that age and minisign has a nifty ability to use SSH keys to sign or encrypt. Would be cool, that you could end up using another person's SSH key to do it, or PGP key.

CMCDragonkai commented 2 years ago

It turns out that bitwarden and 1password doesn't support attachments as part of their backup system: https://news.ycombinator.com/item?id=31702594. Kind of interesting, in our case we don't even have attachments at all, everything would be stored in the vault, which then goes to EFS and eventually the js-db.