Rudd-O / zfs-fedora-installer

Fedora on ZFS root installer
35 stars 6 forks source link

QubesOS Install (v4.2) #35

Open cawilliamson opened 10 months ago

cawilliamson commented 10 months ago

Hello!

Firstly thanks for all of your hard work on this but I'm at a complete stop here trying to get this to actually build on QubesOS 4.2.

Trying to follow the docs on your website for the initial install (planning to pivot to full root-on-zfs) but the problems I've faced so far include:

  1. The 'fedora-release' package isn't installed - no problem - changed the package name to 'qubes-release' and then it doesn't like '4.2' - ok, spoofed all refs of that to just 37 - got past that issue.

  2. It needs a bunch of additional packages not mentionned in the files listed - no problem, installed those

  3. It's then complaining that it can't clone down git repos - ok, a faff but fine - got those downloaded. Still tries to download them so then have to hack all references to the cloning to remove that step since they're already present.

  4. Now get a bunch of issues relating to 'run_in_chroot' I think it was (had to reboot to a working system to write this) and a missing method.

Would love to get this working since basing it on DKMS is fantastic for future kernel upgrades but I'm having zero luck here and can't help but feel I'm missing something obvious.

Thanks!

P.S. I would also be just as happy with prebuilt packages but there doesn't appear to be any of those yet for Q4.2 :(

Rudd-O commented 9 months ago

I would be very glad to make this work in Qubes OS. If you could give me a detailed list of packages that are needed for things to work, I'll make it happen. Yes, a lot of things w.r.t. cloning won't work in a dom0 for obvious reasons, but it's high time this worked in Qubes OS.

Can you show me a verbose-logs failed run of the program in your system? That would speed up a lot of what needs to be done.

tenpai-git commented 9 months ago

Hi Rudd-O! Thank you so much for all your work on qubes. Having a similar issue getting this working on Qubes 4.2! My goal is to go through your guide completely and get qubes on zfs, and then zfs attach a mirror drive to the pool created so I can just resilver if a drive ever dies. I don't think I got as far as @cawilliamson did but since I didn't replace the references but here is the process I took to essentially reproduce/have a similar issue;

1) I installed a fresh Qubes 4.2 install using automatic partitioning/configuration on a brand new SSD. 2) I ran the Qubes Updater and made sure I had the latest for everything per the Qubes installation guide, and restarted twice. 3) I began to follow these instructions here and ran the following command to try and catch all the packages (most were already installed) sudo qubes-dom0-update e2fsprogs cryptsetup mkpasswd gdisk dosfstools qemu-kvm rsync grub2 rsync yum dracut --skip-broken 4) I then ran the following per the instructions; it also said it was already installed. sudo qubes-dom0-update kernel-devel-6.1.62-1.qubes.fc37.x86_64 5) I created an appvm called "git" and cloned this repo into it. 6) I followed the instructions to transfer it over (I used zip -r and unzip on the repo to transfer it.) 7) I ran sudo ./deploy-zfs per the instructions and got a similar error/errors as @cawilliamson cited initially about the about the release package not being found. See below the full log, requested in the last comment.

[admin@dom0 zfs-fedora-installer]$ sudo ./deploy-zfs 
grep: /usr/lib/os-release: No such file or directory
  0m0.61  EE  Unexpected error
Traceback (most recent call last):
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/__init__.py", line 2049, in deploy_zfs
    pkgmgr=SystemPackageManager(),
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/pm.py", line 396, in __init__
    BasePackageManager.__init__(self)
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/pm.py", line 170, in __init__
    self.myreleasever = self.get_my_releasever()
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/pm.py", line 195, in get_my_releasever
    assert 0, "Release package not found"
AssertionError: Release package not found
Traceback (most recent call last):
  File "/home/admin/zfs-fedora-installer/./deploy-zfs", line 11, in <module>
    sys.exit(installfedoraonzfs.deploy_zfs())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/__init__.py", line 2049, in deploy_zfs
    pkgmgr=SystemPackageManager(),
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/pm.py", line 396, in __init__
    BasePackageManager.__init__(self)
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/pm.py", line 170, in __init__
    self.myreleasever = self.get_my_releasever()
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/admin/zfs-fedora-installer/./src/installfedoraonzfs/pm.py", line 195, in get_my_releasever
    assert 0, "Release package not found"
AssertionError: Release package not found

I checked pm.py and saw a lot of references I wasn't prepared to handle, so I also was okay just using the pre-built ones in the repo but, since Qubes 4.2 is new I didn't see a repomd.xml I could use. I thought maybe I could use fc37 since uname -r references that but that currently returns a 404 at time of posting.

I guess if I changed all the package names to 'qubes-release' and install anything missing then locally reference all repos I might be in the same spot as @cawilliamson. I'm not sure if this was exactly his issue at step 1, but I figure it's good to post the exact reference. I hope this helps.

On the flip side I was thinking if it was easier to just drop down to a shell during the Qubes installation and partition everything/install Qubes on zfs in the first place similar to what @eoli3n does in his archiso script to get around the licensing issue, but I'm not sure if I'd be missing something about Qubes under the hood or the installation script that would make that difficult/impossible as an alternative solution.

Qubes is the only part of my infrastructure not yet on zfs so I'm hoping we can put our heads together and figure this out!

tenpai-git commented 9 months ago

Okay I think I was able to build zfs-dkms on Qubes 4.2 by a combination of these guides;

This is the main guide from taradiddles on the Qubes Forum that carried me through the installation but after zfs major release 0.8 it seems that spl is no longer required.

Basically I made an AppVM (git was ready by default) and I did this;

mkdir ~/repositories && cd ~/repositories
git clone https://github.com/zfsonlinux/zfs.git

In order to get the zfs file onto Dom0 from the AppVM, I then recursively zipped and transferred it over to Dom0.

Now, there was some trial and error here and not every dependency may be necessary but these should cover everything.

First from the main guide, there was these: sudo qubes-dom0-update dkms kernel-devel zlib-devel libuuid-devel libblkid-devel lsscsi bc autoconf automake binutils bison flex gcc gcc-c++ gdb gettext libtool make pkgconfig redhat-rpm-config rpm-build strace

I also installed all the dependencies from these two guides from the up-to-date (2.2 as of writing) main documentation of OpenZFS that I could in the Fedora section (I think dnf is just a symlink for qubes-dom0-update).

sudo dnf install --skip-broken epel-release gcc make autoconf automake libtool rpm-build libtirpc-devel libblkid-devel libuuid-devel libudev-devel openssl-devel zlib-devel libaio-devel libattr-devel elfutils-libelf-devel kernel-devel-$(uname -r) python3 python3-devel python3-setuptools python3-cffi libffi-devel git ncompress libcurl-devel

sudo dnf install --skip-broken epel-release gcc make autoconf automake libtool rpm-build kernel-rpm-macros libtirpc-devel libblkid-devel libuuid-devel libudev-devel openssl-devel zlib-devel libaio-devel libattr-devel elfutils-libelf-devel kernel-devel-$(uname -r) kernel-abi-stablelists-$(uname -r | sed 's/\.[^.]\+$//') python3 python3-devel python3-setuptools python3-cffi libffi-devel ncompress

Notably, epel-release and what kernel-abi-stablelists was trying to generate didn't work for me, but you should be able to pass it with the --skip-broken.

From there I had to stop following the main documentation now that it had the 2.2 dependencies (?) and go back over to the original guide from the Qubes Forum.

cd ~/repositories/zfs
./autogen.sh
./configure --with-config=user
make rpm-utils rpm-dkms

This worked beautifully and generated a lot of .rpm packages for me to locally install. I tried the official documentation first but it was a no go and my build was failing. I was following the guide here but make -s -j$(nproc) seemed to go a little too fast and make rpm dropped me with an error. However as specified above make rpm-utils rpm-dkms was just fine.

From there I was able to install dnf install all of the packages for my x86_64 system. I didn't touch any package with a debug generated or labelled source. I was able to install zfs, zfs-dkms, dracut, and test rpm packages. If I do zfs version it gives me the version, so I'm going to try messing around with it and hope it goes well from here. I think I have everything.

I continued to follow the guide on the Qubes forums to automatically load the modules in /etc/sysconfig/modules/zfs.modules (I removed the spl in the for loop) and zfs seems to load on boot after making it executable chmod +x as root.

#!/bin/sh

for module in spl zfs; do
    modprobe ${module} >/dev/null 2>&1
done

And with the tuning in /etc/modprobe.d/zfs.conf and adding options zfs zfs_arc_max=536870912, the memory levels seem good and I can start loading vms.

If anything sounds horribly off here please let me know, I don't quite trust this system setup yet but in-so-far as I can tell zfs is ready to go and I'll continue with @Rudd-O's storage migration for ZFS and Qubes OS on ZFS root guides and post any issues in the appropriate repository.

tenpai-git commented 9 months ago

Good news all, using the above it worked and I was able to install a ZFS SSD Mirror on Qubes using the changes described above!

It didn't go flawlessly though so let me explain a bit. I didn't do anything to setup ZFS beyond what taradiddles wrote in their guide, but even though it's been updated the basis of that guide seems to be using an older version of zfs. I did what I wrote above without needing other parts of the guide to setup ZFS but I think it would be good to note that I did have to make some kernel adjustment parameters at the end.

I followed @Rudd-O's guide through the full storage migration above (which was amazing by the way), and got to the point of installing Qubes OS on ZFS root. There were a couple points that could be clarified but it was all fairly understandable, though there was a problem in the resulting setup. Because you've migrated the pools on the same system without booting into a live-media, there will be a mismatch detected from the filesystem signature and the resulting pool will show you an error on zpool status that there is a :Mismatch between pool hostid and system hostid on imported pool. This pool was previously imported into a system with a different hostid, and then was verbatim imported into this system." I think we can solve this ahead of time in this script for Qubes, though.

To solve this, you might want to import and export but if you've already reclaimed the storage in some way or took it out that won't be so easy. Since you're on root as well, it's the root system, so you won't be able to take the zfs recommended action or a workaround like flicking the multihost property. This reddit thread solved this issue on an earlier version of Fedora and pointed out to me that /etc/hostid doesn't exist on Fedora (and, as I discovered, Qubes 4.2) by default.

Basically what you have to do is run /usr/bin/hostid and it'll give you some result like 037f0110. What you need for SPL is to give the kernel the parameter ahead of time and include it so it knows that it's your system and not the previous or a different one. So you can do this by creating a file like echo 0x037f0110 > /etc/hostid and then including it in your kernel parameters as described here. The parameter you need to include is spl.spl_hostid=0x037f0110 so I just made /etc/sysctl.d/qubes-on-zfs-fix.conf and included that exact value in the file. From there, running dracut -fv --include /etc/hostid /etc/hostid --regenerate all and running grub2-mkconfig -o /boot/efi/EFI/qubes.grub.cfg should make the problem go away.

So can we include for qubes installs the kernel parameter and the /etc/hostid file from the start of installing zfs? You can always troubleshoot by running zfs_force=1 on startup and forcing importing the pool (especially if you get stuck and can't boot your system in GRUB_CMDLINE_LINUX line right before the luks uuid's) but that's probably not a good idea as it might circumvent other safeguards and this can be a more permanent fix I can't imagine interfering with anything else?

I gotta say getting 4.2 was not how I planned to spend my New Years vacation but really loved the resources and glad this worked. I went with a different "option" at the end did a zfs attach [pool] [shortname of pool storage] [uuid of luks on old drive] and the mirror works great now after implementing the above solution, but I think most of the frustration was:

  1. Getting those packages from all the guides.
  2. Realizing I need make rpm-utils rpm-dkms and compiling everything from source.
  3. Fixing kernel parameters for spl/zfs on final boot.

If all of these can be solved by the installer script (if the hostid can be derived early on and set?), the whole process would be a lot smoother. Otherwise that last step might need to be added at the end of the guide for final booting.

The big brain solution of using zfs encryption and setting it up from the start and using live media might prevent the above mismatch issue all together and improve performance, though, too. But I hope this helps script here and/or the guide.

Rudd-O commented 8 months ago

I've made amendments to https://rudd-o.com/linux-and-free-software/how-to-install-zfs-on-qubes-os in order to contemplate manual install, because TBQF deploy-zfs is buggy in Qubes OS at the moment.

I will fix it tho! It's a key use case.

The hostid thing is strange. I never quite got how the thing works, but when the system mounts the root pool on boot, hostid is never a problem for me. It just mounts. You can reguid your pool, I think, after first boot from ZFS root, and it should make the zpool list warning go away.

tenpai-git commented 7 months ago

I've made amendments to https://rudd-o.com/linux-and-free-software/how-to-install-zfs-on-qubes-os in order to contemplate manual install, because TBQF deploy-zfs is buggy in Qubes OS at the moment.

I will fix it tho! It's a key use case.

The hostid thing is strange. I never quite got how the thing works, but when the system mounts the root pool on boot, hostid is never a problem for me. It just mounts. You can reguid your pool, I think, after first boot from ZFS root, and it should make the zpool list warning go away.

Thank you so much @Rudd-O! I agree this is an important project since having individual Zvols for VMs is such an improvement on handling Qubes - especially if you work in security. It makes malware analysis much easier to move around. I was really surprised to find the kernel parameter to be the deciding factor, too, but thanks to your guide and the community there was enough information to get to the end point. I'm typing on my ZFS Qubes system right now!

Could I trouble you to add my github or these comments to the blog post? I love Qubes and ZFS and want to help anyone who might not be able to run it.

And thank you again so much for these write-ups, they're invaluable sources of information.

Rudd-O commented 7 months ago

Did the new instructions work for you? I can close this ticket if that's working.

I'll add a reference to these comments in the blog post.

tenpai-git commented 7 months ago

Thank you Rudd-O!

I went through your entire guide originally and with the notes I made I think it'll be good.

I didn't run/deploy it from scratch (I don't have extra hard drives for this right now) but if my research here helps @cawilliamson or others get setup the information included, it should solve their issue.