cfcs / qubes-storage-zfs

ZFS pool storage for VMs in QubesOS
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Unintuitive: cow writes to template-controlled block devices #1

Open cfcs opened 3 years ago

cfcs commented 3 years ago

When a ZFS-backed VM runs a template backed by e.g. a File pool, the cow for the immutable volume(s) are written using the template's pool, not ZFS.

This is pretty terrible for the zfs_encrypted target where ephemeral writes to / (the root volume) will end up in /var/lib/qubes/appvms/myvm/root-cow.img - unencrypted.

Perhaps we should detect and warn about this, e.g. when running a VM where the template does not run from the same pool. OTOH it's not really a solution to require users to have a template -per- zfs group, so...

Anyway, food for thought, leaving it here for now.

ayakael commented 3 years ago

Indeed, this is caused by a zdev backed appvm using a file backed template. Because snap_on_start and save_on_stop are not implemented, using a template backed by zdev leads to the appvm never creating its root volume, and thus never being able to start. Normally, File does a cow copy through snap_on_start of the template's root volume to the appvms's root volume, and then never saves on stop due to save_on_stop being disabled. Lacking those two functions here makes this behavior impossible. Unless I'm mistaken, as things are now, to have a zdev backed template, you must manually zfs clone path/to/template/root@latest path/to/appvm/root at every update, and from then every appvm will save on shutdown unless the root volume is read-only.

As for the warning message, since the File driver is being deprecated in r4.2, I don't see it being worth implementing. The focus should be on getting snap_on_start and save_on_stop implemented.

The security issue is mitigated on my system, as my whole system pool is encrypted, thus root-cow.img is protected using the same key as the appvm's private. I invite others to, at the very least, encrypt their /var/lib/qubes/appvms folder.

cfcs commented 3 years ago

@ayakael Thanks for the feedback. :)

Ah, interesting, that would indeed be nice to implement then, save_and_stop and snap_on_start I mean.

COW with zfs clone probably makes sense for ZFS, but when logging this issue I was dealing with a file-backed template and expecting the changes to be written to the ZFS storage of the consumer/appvm.

Re: security being mitigated: The encryption mentioned here is there to enable having different groups of VMs encrypted with different keys, so you can have some of them loaded and others not; so it is an issue if data that should be encrypted with the key for vm1 gets written to storage that is encrypted with the key for Qubes dom0 itself.

I think someone reported that this is fixed in recent Qubes though, but I haven't had time to try to update recently.

ayakael commented 3 years ago

@cfcs No problem, glad to help! Thanks to you, as well! I use ZFS for its native encryption and very elegant secure backup implementation. Till I discovered your work, I was stuck with the far from elegant File driver on-top of my ZFS root. I'll try to help the project as I implement things on my side.

Re: Security. Quite fair! I'm still trying to get zfs_encrypted working on my system. I think it's a very worthwhile to see how one can compartmentalize as much as possible. For example, the snap_on_start via zfs clone can clone from one zfs encryption "zone" to another, but to do so cheaply it indeed keeps the same encryption key as its original parent. Thus, whatever writes that occur outside of the VM's home will be written to that non-trusted clone. Even when a False save_on_stop rollbacks the non-trusted clone to its original state, that written data can still be forensically extracted. So possible data waste might include logs, cache, tmp files, etc.

A possible means to circumvent that would be to clone a per-encryption-zone version of the template using 'zfs send | zfs receive' to the zone's import dataset, thus creating a trusted copy of that template for that zone. This woud be expensive time and storage-wise, but would only need to happen on the first run of any VM of that zone, and following that would only need to be incrementally updated when the template changes. I was exploring a more elegant solution, which would involve using zfs change-key -i to inherent the new parent's key after a zfs promote operation. Unfortunately, this isn't supported. And regardless do we even want a scenario where a template from a non-trusted source gets used in our trusted zone? Anyone would certainly aim to keep their vault-template within the same zone as their vault-appvms (I know I would!). I'd imagine it's worth implementing for the sake of flexibility, but a warning to the user would perhaps be appropriate, as it certainly isn't advisable.

As for updated Qubes having fixed the root-cow issue. I can confirm that it isn't the case. I don't think its a bug, but a feature of File. We're just using it wrong, as it's inherent to File to create a COW of the template from source to vid within the pool's dir_path when source is specified under <volume-config>.The only way I can see getting File to create the COW on a trusted zone would be to create a File pool per zdev pools where dir_path is set to a secure dataset, and thus having a template vm per zone. Hardly elegant, though.

You'll notice this in qubes.xml if it came out like mine as defaults:

   <volume-config>
        <volume name="root" pool="varlibqubes" vid="appvms/personal/root" revisions_to_keep="0" rw="True" snap_on_start="True" size="41943040000" source="vm-templates/template-general-f33/root"/>
        <volume name="private" pool="rpool" vid="rpool/vm/personal/private" revisions_to_keep="0" rw="True" size="10485760000"/>
        <volume name="volatile" pool="rpool" vid="rpool/vm/personal/volpdated Qubes having fixed the root-cow issue. I can confirm that it isn't the case. I don't think its a bug, but a feature of File. We're just using it wrong, for we have no snap_on_start to support templates on zdev. I don't think there's ever a secure situation where we want to use a mix of File and zdev drivers. You'll notice this in qubes.xml if it came out like mine as defaults:atile" revisions_to_keep="0" rw="True" size="10737418240"/>
        <volume name="kernel" pool="linux-kernel" vid="5.10.42-1.fc32" revisions_to_keep="0"/>
   </volume-config>

What it should be is this, but that configuration leads to no root volume being created. qubes-qube-manager also likes deleting snap_on_start and source arguments from anything that isn't a File backed pool.

   <volume-config>
        <volume name="root" pool="rpool" vid="rpool/vm/personal/root" revisions_to_keep="0" rw="True" snap_on_start="True" size="41943040000" source="rpool/vm/template-general-f33/root"/>
        <volume name="private" pool="rpool" vid="rpool/vm/personal/private" revisions_to_keep="0" rw="True" size="10485760000"/>
        <volume name="volatile" pool="rpool" vid="rpool/vm/personal/volatile" revisions_to_keep="0" rw="True" size="10737418240"/>
        <volume name="kernel" pool="linux-kernel" vid="5.10.42-1.fc32" revisions_to_keep="0"/>
   </volume-config>
ayakael commented 3 years ago

I'm mistaken. snap_on_start is part of the answer, but the actual function that needs implementation is import_data which should zfs clone from source to vid when snap_on_start=True. The reset function, if save_on_stop=False and snap_on_start=True, would then delete the root vid. I think I might actually attempt an implementation sometime this week. My python is rusty, but it's good practice. Going forward I might just end up attempting the implementation of other functions. 'tis be a fun project (if not for my 40-60 hour work weeks).

Reference file.py - import_data function

 def import_data(self, size):
        if not self.save_on_stop:
            raise qubes.storage.StoragePoolException(
                "Can not import into save_on_stop=False volume {!s}".format(
                    self))
        # The function that actually creates root-cow.img from source
        create_sparse_file(self.path_import, size)
        return self.path_import

Reference file.py - create_sparse_file

def create_sparse_file(path, size, permissions=None):
    ''' Create an empty sparse file '''
    if os.path.exists(path):
        raise IOError("Volume %s already exists" % path)
    parent_dir = os.path.dirname(path)
    if not os.path.exists(parent_dir):
        os.makedirs(parent_dir)
    with open(path, 'a+b') as fh:
        if permissions is not None:
            os.fchmod(fh.fileno(), permissions)
        fh.truncate(size)

Reference file.py - reset function

    def reset(self):
        ''' Remove and recreate a volatile volume '''
        assert not self.snap_on_start and not self.save_on_stop, \
            "Not a volatile volume"
        assert isinstance(self.size, int) and self.size > 0, \
            'Volatile volume size must be > 0'

        _remove_if_exists(self.path)
        create_sparse_file(self.path, self.size)
        return self

Reference file.py - snap vs stop map

    snap_on_start save_on_stop layout
    yes           yes          not supported
    no            yes          snapshot-origin(volume.img, volume-cow.img)
    yes           no           snapshot(
                                   snapshot(source.img, source-cow.img),
                                   volume-cow.img)
    no            no           volume.img directly
cfcs commented 3 years ago

Thanks for the research, I'm a bit occupied brainspace-wise and can't really take it all in now.

I saw you fixed the udev issue where you get one million popups on zvol activation; that was very annoying. Will you send a PR when you have tested your changes?