ewwhite / zfs-ha

ZFS High-Availability NAS
759 stars 77 forks source link

HA for Encrypted ZFS? #13

Open schmitmd opened 7 years ago

schmitmd commented 7 years ago

Hello @ewwhite,

Do you know (or should I start poking around and report what I see?) what the ramifications of native ZFS encryption would be on this kind of system?

Obviously, encryption-at-rest is an awesome thing to have in general (and it's finally here on ZoL).

Reading through the zfs and zpool manpages, having keylocation=prompt wouldn't work, but setting it to file://FILE.(hex|raw|passphrase) and doing zpool import -l seems realistic (so long as the keylocation isn't inside the pool, obviously).

We'd probably want to zfs unload-key asap inside the pool that died too, but encrypted mounts can't happen if zfs "keystatus" property is "unavailable".

Maybe there should be some logic around the "encryptionroot" property; something like: "import the pool so long as at least one filesystem in the pool doesn't have this property set," for instance. Then it would at least mount something regardless of keystatus, and encrypted datasets could be manually mounted later?

What are your thoughts? I feel like this will be quite the undertaking..

Lady-Galadriel commented 7 years ago

For the keylocation issue, I suggested a feature where the keyfile could be a raw partition, like from a USB flash drive. This would make it a physical key device. Easy enough to move it from host to host. Yet, reasonably secure since whence the key file is used, the USB flash drive could be removed and locked up. Suitable for HA use.

Note you can achive similar results with a USB flash drive which has a file system and key files on it. It just requires manual intervention, (or some scripting on the host).

It's in this github issue https://github.com/zfsonlinux/zfs/issues/6556

schmitmd commented 7 years ago

The location of the key is only concerning (to me) insofar as it seems like it would have to be cached or easily available somehow for the failover to work properly. Am I just mistaken in my understanding?

Having the keys be automatically available kind of defeats one of the main reasons for having encryption (limiting access to "authorized" parties), doesn't it?

Situations:

  1. (when server A dies): Server A loses power and server B needs to take over. Server B can't do so because it needs an automated way to load a key from a keysource or must keep it cached somewhere, waiting to be ready for the failover.

  2. (when A boots back up): Server A boots back up, and wants to be primary again. It can't, unless it has an automated way to load the encryption key(s)

  3. Both die: Both A & B die. Pool and HA services never start back up due to lack of key availability.

Potential Solution to Situation 2: Have it such that you need to manually add the key to A before it's back in the HA resource pool? Let the server boot, but remain inactive in regard to the resource pool and set a state flag or something?

Basically all 3 of these run into the same issue. "How does key load/unload functionality work (or does it) with HA?" It feels pretty chicken & egg; is it too far down the "personal preference" road?

Lady-Galadriel commented 7 years ago

Let's got back to some basics.

Encryption at rest, (which is what OpenZFS is adding), protects against 2 main problems;

A. Failed disk data protection B. Theft of disk(s) or server(s)

Theorectically a 3rd problem of a hacker getting into the server remotely, but the SysAdmin un-loads the encryption keys at some point. A bit more difficult to have an action plan.

So if the purpose is to protect against item A, failed disks, then having an encryption key always available is not a problem. When a bad disk get's sent back for warranty replacement or service contract replacement, vendor access to the data is not a concern since it's encrypted. Note that this senerio also applies when replacing disk(s) with larger ones to grow a ZFS vDev, the old disks have encrypted data.

In the case of item B, the encryption key(s) should be kept separate from the server, preferably locked up. So theft of one, does not impact the other. Meaning theft of a USB flash drive with the encryption key on it, has no effect after the SysAdmin re-keys the OpenZFS encrypted dataset.

OpenZFS encryption does nothing about un-authorized access, in the sense that whence the encrypted dataset is un-locked by using the encryption key, (passphrase or key file), the normal OS filesystem and data file security take over.

Now back to your questions, too much depends on what you mean by un-authorized access.

On the HA side, it again depends on the type of protection you need. If it's just item A, then you could have a tiny shared storage drive that has the encryption key(s). In someways, raw partitions work better because both hosts of he HA cluster could use it at the same time. Mounting a file system from this tiny shared drive would be problematic, since at least 2 hosts could be doing it.

For HA and item B, theft, then an SysAdmin or operator should / would have to be available on fail-over or reboot to enter the passphrase or copy in the encryption key file. No easy way around it. Trying to cache the key file in memory on all hosts in a HA cluster may end up with the key file in swap

All that said, if there was a RAM based USB type drive, then using it as the location of the encryption key and having one installed on all hosts might solve the theft problem. Loss of power long enough would cause the RAM to degrade and loose the encryption key. Any normal maintenance that requires power off, also requires the SysAdmin to re-install the encryption key from a master location. Perhaps in single user mode. The SysAdmin can even do this remotely if the remote console access device uses SSH or SSL protection.

ACiDGRiM commented 6 years ago

Maybe this is arealdy a feature of the keylocation value, but why not provide support for a key on an https://user:password@keyserver.local:/zfs_keyfile

When kept on a secure network this would allow multiple systems to access the keyfile. storing the password for such a system in plaintext shouldn't be an issue as a malicious actor would have identical access if the keyfile was stored as a local partition or file.

The webserver protecting the keyfile could then itself encrypt a zfs volume with the 32bit keyfile, and only decrypt to be available for hosts in the case of needing key recovery. Further protecting from unauthorized access

schmitmd commented 6 years ago

@ACiDGRiM , I believe the feature you're talking about was requested recently from the man himself (Tom Caputi, long may Datto reign). I'm not sure how nicely a user:pass combo would play with whatever automation we'd need here for failover, but I think it's another excellent keylocation option.

The Lady is right with her summary of the reason for encryption-at-rest: cold data protection from OEMs/thieves. If someone malicious gets into an active system, they've got all they need.

My original question was meant to be essentially: "Is there a way to load the keys such that HA is maintained in an automated fashion and have them (at least somewhat) separate from the system?"

The answer is probably something to the same effect you've hit on (a separate, trusted service like Hashicorp Vault or a webserver). I like the idea of a RAM-based USB key, which doesn't seem like it'll require much in the way of separate resources/services, and at least dies with power loss. Same old story, it boils down to security vs. convenience.

If anyone sees any other potential solutions (regardless of "good practice"), please share. I think it'd be helpful for there to be at least a few explicitly defined options and I'm happy to make a PR to add them as official notes.

My personal take:

ACiDGRiM commented 6 years ago

If anyone is wondering I have this working well with the following

  1. Edit the pacemaker zfs resource, need to add the -l parameter on each of the lines that initiate a pool scan and import. This will load the key on encrypted volumes.
  2. add a cloned pacemaker resource to mount an sshfs path to a small key server (an RPi in my case), this should startup before the zfs pool resources. Also must have the SSH key from each controller node on the key server. Recommend setting autoreconnect and preventing this resource from stopping the dependencies if it stops responding.
  3. make sure the keylocation on the zfs volumes is the path of the sshfs mount resource

This will result in the system mounting all of the ZFS volumes from the securely mounted keyfile path. Afterwords, the key file host can be turned off to protect the key. I have this particular system Luks encrypted.

dkobras commented 6 years ago

@ACiDGRiM You've effectively exchanged "system needs to know the dataset's encryption key" with "system needs to know one of two ssh private keys" to access the data, but availability of the external key server during startup then becomes a single point of failure. In certain scenarios, this external dependency may be justified. Otherwise, I'd suggest to just use individual GPG key pairs for each server, GPG encrypt the dataset's encryption key with the public keys of both servers, and store it eg. as a dataset property, or a separate, unencrypted dataset. Add further copies of keys for additional protection from data loss. Place GPG keys on removable media or smart cards, or add passphrases to the GPG keys depending on your availability and security constraints.

ACiDGRiM commented 6 years ago

The goal is to make the key file only accessible when it's needed, like a root CA and intermediate CA, disconnecting the Root CA from the network except for when absolutely necessary increases security.

In my scenario, when I need to activate my storage array, I turn on the key system and decrypt the partition with the keyfile before activating the nodes

In an environment when the risk factor is removing disks from the environment, making the key always available could be resolved by putting the sshfs shares IP behind a pair of systems with a pacemaker managed HA IP, both using the same system RSA private key. I use this kind of configuration for other purposes and it works well if momentary access is required in the case of 2 controller nodes failing over to the secondary at an unplanned time. Increasing the number of sshfs hosts would reduce the likelihood the key is unavailable, but isn't as secure as a gated deployment.

igoratencompass commented 6 years ago

Why not use something like AWS KMS for example https://github.com/jcmturner/awskmsluks/blob/master/README.md

It takes care of HA for example, key protection and rotation and in case of theft you can change the API keys for the IAM user.

dkobras commented 6 years ago

@ACiDGRiM From my point of view, the CA analogy doesn't really apply here because once the encryption keys are loaded, removing access to the key file doesn't add any substantial protection: Attackers in a privileged position wouldn't bother to read the key file, they can directly read out the unencrypted datasets.

almereyda commented 2 years ago

You might be interested in these more recent discussions: