archlinux / archinstall

Arch Linux installer - guided, templates etc.
GNU General Public License v3.0
6.13k stars 531 forks source link

Consider changing the BTRFS subvolume layout #781

Closed KhalilSantana closed 2 years ago

KhalilSantana commented 2 years ago

Hello,

I'm very happy to see that v2.3.X includes BTRFS subvolume support as a beta feature! This was a major blocker for me and now it's (mostly) gone.

However, I would like to comment on the current subvolume layout used by Archinstall's guided mode, and why (and how) I think it could be improved.

The Current Layout

For reference this is the current layout used by guided mode as of 2021-12-1 using archinstall v2.3.0:

# btrfs subvol list -t /
ID  gen top level   path    
--  --- ---------   ----    
256 21  5       home
258 21  5       var/log
259 17  5       var/cache/pacman/pkg
260 10  5       .snapshots
264 14  5       var/lib/portables
265 15  5       var/lib/machines

How BTRFS handles subvolumes, and snapshots

(this section exists mostly to add context, skip it if you are familiar with BTRFS)

In BTRFS, each subvolume has an ID and a path, relative to the toplevel (a.k.a subvolid=5), these subvolumes can be mounted anywhere, and can be used for many things just like folders, however, the main differentiator between a regular folder and a subvol is that the later has special powers: atomic snapshots, etc.

Snapshots are just a clone of that filesystem root (subvol) at that time, and it shares extents with the snapped subvol, this is one of reasons snaps are so useful: they are fast and small.

However, there are some important quirks/behaviors that BTRFS currently does:

With this in mind, I hope the next section becomes easy to understand.

What's wrong with the current layout?

The current layout basically prohibits using snapshots as a rollback feature, why?

In short the main issues with the current subvol layout are:

What would be a better layout?

This is often called the "flat-layout", often used with the ubuntu-style naming scheme (@ as /, @home as /home), and it's documented on the wiki

What other distros are using?

My layout suggestion

Follow the flat layout, with the Ubuntu-style naming scheme, but keep the extra subvols (/var/log & /var/cache/pacman/pkg), those extras not nested, of course.

Basically, this:

# btrfs subvolume list -t /
[sudo] senha para khalil: 
ID      gen     top level       path
--      ---     ---------       ----
256     1408526 5               @
257     1408526 5               @home
258     1408526 5               @pkg
259     1408526 5               @log
260     1391176 5               @snapshots

In tree view for clarity's sake:

# exa -TL1 /mnt/toplevel # this is the mount point of subvolid=5 on my system, for maintance purposes)
/mnt/toplevel
├── @
├── @home
├── @pkg
├── @log
└── @snapshots

Why the Ubuntu naming scheme? Why not Fedora's? Or something else entirely?

Mostly because of Timeshift, as it only supports that standard. And while it might seem like a silly reason, I feel like it's reasonable: Timeshift is the one of the few snapshot utilities that has an incredibly easy to use restore button, which makes undoing changes incredibly easy, and also un-breaking a system after a bad upgrade.


Thank you for your time!

Torxed commented 2 years ago

Thank you for this very detailed feedback! Really awesome to see everyone suggesting good things to improve the btrfs subvolume support. We'll track it in this issue and in #718 where the topic came up during an issue of re-using btrfs subvolumes that was anything other than what we supply.

wllacer commented 2 years ago

@KhalilSantana Great explanation for what i also do empirically,

With a slightly different layout (i don't care -still- for snapshooting but more for usage flexibility. At the end of issue #718 you will find an unoffical patch that allows archinstall to work with flat layouts IF they are defined and mounted previously. I would be glad i could have some feedback about it. I could make a PR of it, but i'm not that sure of its quality (works for me, but this is what i can guarantee). I want to expand it to full support of flat schemas. Any suggestion and ideas would be welcome

Torxed commented 2 years ago

I would be glad i could have some feedback about it.

It's coming in a bit : ) Gathering my thoughts to not just blurp out unfinished thoughts :)

rexhent commented 2 years ago

This is definitely important. Btrfs is a bit of a pain to remember how to use when doing the arch install "the old way", and one of my main reasons to use EndeavourOS right now.

DrymarchonShaun commented 2 years ago

Just to put it out there as well, why not go with opensuse's layout, so that snapper can be fully utilized? Timeshift works, but the last time I tested it, it left like it lacked alot that Snapper has; rolling back with snapper is as easy as booting a snapshot of your working system, and running snapper rollback when you are using the right subvolume layout.

From https://en.opensuse.org/SDB:BTRFS

/home If /home does not reside on a separate partition, it is excluded to avoid data loss on rollbacks.

/opt Third-party products usually get installed to /opt. It is excluded to avoid uninstalling these applications on rollbacks.

/root The root users home directory should also be preserved during a rollback

/srv Contains data for Web and FTP servers. It is excluded to avoid data loss on rollbacks.

/tmp All directories containing temporary files and caches are excluded from snapshots.

/usr/local This directory is used when manually installing software. It is excluded to avoid uninstalling these installations on rollbacks.

/var This directory contains many variable files, including logs, temporary caches, third party products in /var/opt, and is the default location for many virtual machine images and databases. Therefore this subvolume is created to exclude all of this variable data from snapshots and is created with Copy-On-Write disabled.

What this ends up looking like (with the top-level subvolume mounted at /btrfs)

/Btrfs
└── /@
      ├── /home                       /home
      ├── /opt                        /op
      ├── /root                       /root
      ├── /srv                        /srv
      ├── /tmp                        /tmp
      ├── /usr_local                  /usr/local
      ├── /var                        /var      (cow disabled)
      └── /.snapshots                 /.snapshots
              └── /1/snapshot       /

Putting the root subvolume in .snapshots makes using snapshots much cleaner, as the original root subvolume is essentially a snapshot, so what happens after you boot into a read-only snapshot and run snapper rollback is 1. A read only snapshot of the current (broken) system is created, 2. a read write copy of the currently booted snapshot is created and set as the default btrfs partition, so that when you boot back up, it gets mounted as the new root subvolume.

It is more complicated to wrap your head around to begin with, but once you understand it, it makes way more sense.

If it helps make it understandable at all, I just finished fixing a (janky) script that installs arch with this layout (albeit with a lot of separation of /var) https://github.com/ShaunTheQuietGamer/Arch-Setup-Script/blob/main/install.sh It might help explain the process to get this functioning, but I understand that there's probably a better way to do all the stuff its doing, and some of is important to get the system functional (like modifying fstab, and adding some stuff to the grub config,) a lot of it is not.

EDIT: I didn't fully understand how this whole thing worked, and still probably don't, but isn't it theoretically possible to add both layouts as an option for users? Instead of it asking Yes or No if they want a default subvolume layout, it could ask if they

  1. Don't want subvolumes
  2. Want the Ubuntu style flat layout
  3. Want the opensuse layout with snapper support
  4. Manual subvolume setup (like the disk partitioning section)

(Something else to look at is possibly adding an extra Y/N if the Opensuse layout is selected to also install and set up snapper, grub-btrfs, and snap-pac with snapper rollback functioning.)

Torxed commented 2 years ago

Before we close this, and you'll have to correct me if I'm wrong because there's almost too much context here so I might have gotten a bit lost in all the reading. But the desired outcome is:

/Btrfs
└── /@
            ├── /home                       /home
            ├── /opt                        /opt
            ├── /root                       /root
            ├── /srv                        /srv
            ├── /tmp                        /tmp
            ├── /usr_local                  /usr/local
            ├── /var                        /var      (cow disabled)
            └── /.snapshots                 /.snapshots
                      └── /1/snapshot       /

Which we currently can't support because we don't support a mountpoint (/ in this case) inside another subvolume, is this a correct assumption @wllacer (pinging you because you've done more work on the subvolumes than me at this point).

If we can live without mountpoints inside subvlumes, this issue is fulfilled. Otherwise I'll keep this open but I'll move it for v2.4.0 as I'd like to get v2.3.1 out ASAP for patch reasons.

wllacer commented 2 years ago

With the code just merged you can create almost any setup you can imagine (most of them dead wrong). Subvolume hierachy and File System hierachies can (and should be) totally distinct. The only things to take care are:

KhalilSantana commented 2 years ago

Commenting a little bit on @ShaunTheQuietGamer's layout:

/srv Contains data for Web and FTP servers. It is excluded to avoid data loss on rollbacks.

I also do this on my system, but omitted from my original proposal for KISS principles, but it's a pretty sound suggestion in my opinion.

/opt Third-party products usually get installed to /opt. It is excluded to avoid uninstalling these applications on rollbacks. /usr/local This directory is used when manually installing software. It is excluded to avoid uninstalling these installations on rollbacks.

While I can understand the point you are making, I also think this is somewhat of a foot gun: applications seldom exist within a vacuum, they have dependencies that may be scattered elsewhere on the filesystem, thus rolling back the rootfs without also rolling back these may not lead to a working (read: consistent) set of packages.

And yes, one could configure snapper to also snapshot this in sync with the rootfs subvol, then figure out a way to rollback from that (either manually, or via snapper itself somehow??), but that feels like overcomplicating things.

/tmp All directories containing temporary files and caches are excluded from snapshots.

I'm not sure what openSUSE does with /tmp, but on ArchLinux, /tmp is a ramdisk (tempfs) by default, so I don't see the point on this being a subvol.

/var This directory contains many variable files, including logs, temporary caches, third party products in /var/opt, and is the default location for many virtual machine images and databases. Therefore this subvolume is created to exclude all of this variable data from snapshots and is created with Copy-On-Write disabled.

I don't fully understand how Archlinux package manager (libalpm) works, but I believe it uses /var/lib/pacman/local to keep track on what's installed or not, thus having /var out of sync (ie: after a restore) from the rootfs would lead from pacman -Q disagreeing on what's on the rest of the filesystem (say, /usr/bin/firefoxbeing installed or not).

If my understanding is correct, splitting the entire /var into a subvol should be avoided, as it would break pacman each and every time you rollback.

--

Regarding CoW vs no-CoW I also disagree, nodatacow is basically an escape hatch, that disables or otherwise breaks many core features of BTRFS, including, but not limited to:

Those are very big downsides which should be decided on a per-application basis, either by the user or by the application itself, not a blanket disable on the entire /var directory.

DrymarchonShaun commented 2 years ago

(To Khalil) Your points make total sense, I see no issue with modifying the layout to work better with arch, every distro is different and is going to require different things. I guess I should specify/have specified that it was more that I think it's worth looking at opensuse's layout to see if anything they are doing would makes sense to add for the (or one of the) defaults for arch. I think the only thing that is absolutely required (subvolume wise) for snapper is for the original root be mounted in /@/.snapshots/1/snapshot, having .snapshots mounted at /@/.snapshots may also be required. Speaking as the one that's not having to do any of the work, (you or others may have a different opinion as the ones actually implemented this,) I would think is important to give choices. Rather than have the only default option be timeshift, give the choice between the two (or a totally manual layout, obviously.) Part of the problem with choosing one over the other is that, for snapper at least, you have to modify some configs during the installation to make it function correctly. (Setting an option to true in grub, modifying fstab to not force /@/.snapshots/1/snapshot to be mounted, and instead tell it to mount the default subvolume, etc.)

DrymarchonShaun commented 2 years ago

I really suck at explaining things, and snapper doesn't have nearly enough documentation on the required layout... I'm going to try to make it a bit more understandable, and try to change it to match the points Kalil made.

Subvolumes Filesystem mount point
/mnt/@/ N/A
/mnt/@/home /home
/mnt/@/root /root
/mnt/@/srv /srv
/mnt/@/var_log /var/log
/mnt/@/var_pkg /var/cache/pacman/pkg
/mnt/@/.snapshots /.snapshots
/mnt/@/.snapshots/1/snapshot /

just for the sake of clarity from here down, / is the filesystem root, \ is the partition root that can be mounted somewhere within the filesystem root but doesn't have to be mounted. I've been using / for both and I think that's the main reason why there is some confusion I'm not entirely sure why opensuse does \@/* instead of \@*, I think the idea is to distinguish the file system root / and the files in it from the partition root \@/ and the subvolumes in it. If I say \@/.snapshots I'm talking about the .snapshots subvolume, if I say /.snapshots I'm talking about the filesystem location, its to stop the confusion that is currently happening. Regardless, its what is required for snapper.

Actually, I think I might have an analogy. Think of each of the subvolumes as physical disks on a normal system within /dev/*. They are all in /dev so you know they are physical drives, just like the subvolumes are in \@/* so you know they are subvolumes, because technically you could create a text file in the partition root right next to '\@', (although I imagine its bad practice,) and if it was all just in the partition root it could get confusing.

Anyways, the filesystem root is mounted as \@/.snapshots/1/snapshots (1 is just a folder in the \@/.snapshsots subvolume,) because it is essentially the first snapshot of the system. When you cd into \@/.snapshots/1/snapshots its your current filesystem. The next snapshot that is created is \@/.snapshots/2/snapshot/ and if you cd into it, its a copy, a snapshot, of a root filesystem.

Torxed commented 2 years ago

Subvolumes Filesystem mount point /mnt/@/ N/A /mnt/@/home /home /mnt/@/root /root /mnt/@/srv /srv /mnt/@/var_log /var/log /mnt/@/var_pkg /var/cache/pacman/pkg /mnt/@/.snapshots /.snapshots /mnt/@/.snapshots/1/snapshot /

This looks pretty similar to what we got with the exception of the root subvolume not being mounted and a separate @/root being created. And I think the current code can handle this one with a breeze. @wllacer will have to correct me if my assumptions here are wrong.

wllacer commented 2 years ago

Subvolumes Filesystem mount point /mnt/@/ N/A /mnt/@/home /home /mnt/@/root /root /mnt/@/srv /srv /mnt/@/var_log /var/log /mnt/@/var_pkg /var/cache/pacman/pkg /mnt/@/.snapshots /.snapshots /mnt/@/.snapshots/1/snapshot /

This looks pretty similar to what we got with the exception of the root subvolume not being mounted and a separate @/root being created. And I think the current code can handle this one with a breeze. @wllacer will have to correct me if my assumptions here are wrong.

Happy New Year to all ! @torxed is right, this is a flat version of the current layout, and I don't dislike it for a basic layout (although I separate var as subvolume. Just a comment , both @var_log and @var_pkg don't need to be mounted IF there is no need from the OS side, just for backups and are created as embedded suvolumes f.i. if @var is created as a subvolume (which is interesting) (edited) Default layouts is a tricky matter, because it depends on the end users need. A couple of samples

The sad true is that most of us lack, for the time being, serious experience with Btrfs and the impact of alternatives. At least, since yesterday, when PR #787 was merged we can play the different alternatives on "real metal" (well, VMs) with easy - if you don't mind editing json files. If you read the comments there, we have tried to document what and how it can do it.

Let's play for a while before taking some decision (and try to involve in the discussion the more wide Archlinux communiy)

Torxed commented 2 years ago

and try to involve in the discussion the more wide Archlinux communiy

I think this might be a good idea. This topic seems too complex for me to merge any alternative on my own.

DrymarchonShaun commented 2 years ago

Sounds good to me. (everything after this is reliant on figuring out the non snapper/timeshift specific default subvolumes, (as in extra ones that may be necessary for pacman, etc)

I've kind of been creating a mental image of how I was thinking i'd set up the configuration menus for subvolumes and snapshots. I'm almost certainly getting ahead of myself (and everything else,) but I though I would but it out there regardless. This is once again coming from someone that doesn't know how to do the work, or the standards/rules being followed, so I'm sure changes/tweaks would be necessary, if you all think it fits into the scope of this installer at all.

First it would ask what layout the user wants

Select a subvolume layout
1. Ubuntu style layout (Compatible with Timeshift)
2. openSUSE style layout (Compatible with Snapper)
3. Manual configuration
4. None //could be under the manual configuration menu?

The the manual configuration would again be similar to the manual configuration for partitions. If 1 or 2 are selected, there could be a yes/no question for people to manually adjust the layout for certain use-cases as @wllacer outlined.

Would you like to manually adjust the subvolume layout?
1. Yes
2. No

If yes, it would go into the same manual configuration menu as selecting 3 in the prior step, but with the previously selected layout already configured. ( After This is where i'm thinking it might start to get away from the scope of the installer. ) After either skipping manual adjustment, or finishing it, then it would have another Yes/No pulling the user input from the first question

Automatically install [Timeshift/Snapper]?
1. Yes
2. No

and then if no, it just moves on to the next step of the install, if yes it sets up timeshift or snapper, (this is were I might actually be able to help with the snapper configuration, as its not just installing the package, you have to change some configs for snapper to fully function.)

Torxed commented 2 years ago

Select a subvolume layout

  1. Ubuntu style layout (Compatible with Timeshift)
  2. openSUSE style layout (Compatible with Snapper)
  3. Manual configuration
  4. None //could be under the manual configuration menu?

We could have this menu instead of the "Do you want to use subvolumes?" as this one packs more information but would add no additional steps.

ghost commented 2 years ago

Hi, just a quick question, why /var/cache/pacman/pkg instead of just /var/cache ? It makes little or no difference as most files are in /pacman/pkg/, but what is there to gain in going 2 steps deeper instead of excluding all cache from snapshots ?

Torxed commented 2 years ago

Hi, just a quick question, why /var/cache/pacman/pkg instead of just /var/cache ? It makes little or no difference as most files are in /pacman/pkg/, but what is there to gain in going 2 steps deeper instead of excluding all cache from snapshots ?

It's probably to strictly snapshot package states, for easier rollbacks or to use snapper (etc) to snapshot a point in time of a package store. For instance a stable snapshot when you knew packages were working, and you could (wild pseudo commands to follow) do something like pacman -U /var/cache/pacman/pkg/* and know it'll be stuff that works :)

ghost commented 2 years ago

Anything in the subvolume is not included in snapshots... that will be true, whether the subvol is /var/cache or /var/cache/pacman/pkg. The point being to not fill the file system up with ever changing versions of cached packages

Torxed commented 2 years ago

Then I've got it backwards. I for sure would rather like to have it included in snapshots. Or at least I looked at it as a great way to take snapshots of known states :) But I guess then again, you could just snapshot your installation folders instead and store the known states there.. But then you'd might lose data if they're stored alongside the binaries etc, then those snapshotted installation packages would come in handy for a rollback.

Anyway, that's why I thought peopled wanted that subvolume :)

ghost commented 2 years ago

I think the installed state is in /var/lib/pacman/local /var/cache/pacman/pkgis just the cache of installer files downloaded by pacman. The ones you clear with paccache.

$ ls /var/cache/pacman/pkg/
a52dec-0.7.4-11-x86_64.pkg.tar.zst
a52dec-0.7.4-11-x86_64.pkg.tar.zst.sig
aalib-1.4rc5-14-x86_64.pkg.tar.zst
accountsservice-0.6.55-3-x86_64.pkg.tar.zst
acl-2.3.1-1-x86_64.pkg.tar.zst
adobe-source-code-pro-fonts-2.038ro+1.058it+1.018var-1-any.pkg.tar.zst
adwaita-icon-theme-41.0-1-any.pkg.tar.zst
.....

They don't need to be backed up.

KhalilSantana commented 2 years ago

@Torxed,

Btrfs snapshot stops at the subvolume boundary, so putting /var/cache/pacman/pkg into a subvol means btrfs subv snap / /.snapshot/foo won't include anything inside the pkg folder, which is, in my opinion, the best way to do something like this.

By excluding pkg from the main rootfs subvol you can make snapshots of it much smaller, in case you want to do stuff like btrfs send... or just for diff purposes. Plus, having a pkg folder that's unchanged across rollbacks (assuming proper use of the non-nested subvol like I explained before), you could save a bit of network bandwidth by not having to re-download everything again.

Regarding /var/cache being a subvol or not, I'm not entirely sure, I don't think would cause any issues, but I also don't think it adds too much versus just putting pacman's pkg folder into a subvol, as that's by far the largest folder in my /var/cache/with netdata taking the second place at some 200MB's.

ghost commented 2 years ago

Regarding /var/cache being a subvol or not, I'm not entirely sure, I don't think would cause any issues, but I also don't think it adds too much versus just putting pacman's pkg folder into a subvol, as that's by far the largest folder in my /var/cache/with netdata taking the second place at some 200MB's.

I agree it doesn't add too much, but it still adds a little and makes it simpler at the same time, that's why I don't really see any reason not to use it instead of /var/cache/pacman/pkg.

Plus, having a pkg folder that's unchanged across rollbacks (assuming proper use of the non-nested subvol like I explained before), you could save a bit of network bandwidth by not having to re-download everything again.

It's a little off-topic so feel free to ignore, but in which scenario would you re-download the same installers for packages which are most likely still installed ? I struggle to understand why the usage is to keep so much cached packages, if I install something I most likely want to download the latest version, and if I uninstall something I'm unlikely to change my mind and install it again.

Back to the topic, is any thought given to suggesting a subvolume layout for @home ?

There are also good candidates for excluding from snapshots, like .cache, .local/Trash, .local/libvirt/images ... but it can be a little more complicated to deal with..

KhalilSantana commented 2 years ago

@clavelc

It's a little off-topic but in which scenario would you re-download the same installers for packages which are most likely still installed ? I struggle to understand why the usage is to keep so much cached packages, if I install something I most likely want to download the latest version, and if I uninstall something I'm unlikely to change my mind and install it again.

It depends on the user really. If someone's building a "golden image" of their Arch install, perhaps they'll want to test multiple drivers/settings before committing to it, so the user builds their install step-by-step using snapshots as checkpoints. Creating this "golden image" might take several attempts before figuring out which exact packages are required, which ones are bloat, and which settings or quirks to apply due to their hardware.

So having an unchanging pkg would be helpful, the user could snapshot his working system, then mess around with drivers, DEs, etc, then revert and try again, without needing to redownload most stuff.

KhalilSantana commented 2 years ago

My original proposal was rather simple, but as everyone is suggesting their own (more advanced/sophisticated) layouts, I'll dabble a bit on my own more featured layout, mostly for my own uses, but I'll try to make an effort to make this cover as many use cases as I can. (I'll also edit this comment as ideas come along).

I'll also use a table to make this more readable. The priority column is how much I care about that particular choice, or how much I think it's going to matter for users of specific software.

Subvol path Mount point Priority Comment
/@ / High The root filesystem must be separated from user data (ie: home) for ease of reinstall, rollback, etc.
/@home /home High User data must be separated from system data to enable rolling back either without affecting the other, ease of reinstall, etc.
/@pkg /var/cache/pacman/pkg Medium Separating this dataset from the others means users of btrfs-send (or wrappers like btrbk) will have smaller send streams.
/@VMs /var/lib/libvirt/ High Rolling back the rootfs shouldn't affect VMs. And these are usually huge so the btrfs-send argument also applies here.
/@containers ??? Medium Stuff for podman/docker so it doesn't implode when you rollback the rootfs.I'm not sure if docker uses the BTRFS driver by default on Arch, but podman doesn't. Mount point varies if user is using podman or docker, thus I left it blank
/@srv /srv Low A nice place to dump random daemon homedirs, as pacman mostly won't touch this dir anyway. Basically a miscellaneous folder
/@log /var/log/ Medium Logs should be consistent across rootfs rollbacks

EDIT-0: Added the /@log subvol here, even though it was on my original comment I forgot to include it here =p. Thanks @Forza-tng via IRC

EDIT-1: Changed the /@VMssubvol mount point to /var/lib/libvirt instead of just it's images subvol, as the VM's NVRAM are stored on a sibling directory and those must be consistent. Thanks Jannik2099 via IRC for the suggestion.

KhalilSantana commented 2 years ago

@ShaunTheQuietGamer

just for the sake of clarity from here down, / is the filesystem root, \ is the partition root that can be mounted somewhere within the filesystem root but doesn't have to be mounted. I've been using / for both and I think that's the main reason why there is some confusion

Consider using "toplevel" or "subvolid=5" to describe filesystem root of all roots, using \ is best left for Windows folks =p.

Actually, I think I might have an analogy. Think of each of the subvolumes as physical disks on a normal system within /dev/. They are all in /dev so you know they are physical drives, just like the subvolumes are in \@/ so you know they are subvolumes, because technically you could create a text file in the partition root right next to '@', (although I imagine its bad practice,) and if it was all just in the partition root it could get confusing.

I didn't really understand your analogy, for one it "compares" subvols to block devs, which is a rather bad way of understanding it in my opinion because subvols aren't block devs, they are independently mountable filesystem roots.

The best way to explain a subvol to a newbie would be something like "subvols are like normal directories with special semantics such as atomic snapshots and separate inode numbering, thus they share the same free space".


@wllacer

The same with VM images, this time aggravated by the fact that they are usually inside one user's home directory. (in this case I simply play with the nodatacow atrribute). Or for the KDE users, the directory where the baloo index file goes ...

Not really needed. KDE already does this on its own. Try this on your system and see for yourself:

% find ~/ -print0 | xargs -0 lsattr 2>/dev/null | grep -v -- '----------------------' | grep -- '-C-'
---------------C------ /home/khalil/.local/share/baloo
---------------C------ /home/khalil/.local/share/baloo/index
---------------C------ /home/khalil/.local/share/baloo/index-lock
---------------C------ /home/khalil/.local/share/baloo/index
---------------C------ /home/khalil/.local/share/baloo/index-lock
---------------C------ /home/khalil/.local/share/akonadi/db_data
---------------C------ /home/khalil/.local/share/akonadi/db_data/aria_log_control
... # and more

@clavelc

Back to the topic, is any thought given to suggesting a subvolume layout for @home ?

There are also good candidates for excluding from snapshots, like .cache, .local/Trash, .local/libvirt/images ... but it can be a little more complicated to deal with..

This is a more tricky subject in my opinion, for one, subvols have a few quirks associated with them that might not be immediately obvious to newbies, sometimes resulting in data loss. See this borgbackup issue as an example.

This doesn't mean I don't think subvols can't be used on homedirs, they can, it just requires the user to understand its quirks. For example, fragmentation is a common issue when torrenting on BTRFS, so you can (ab)use the same feature/bug (subvols being treated as different filesystems) to have your torrent copied + unliked when it finishes, thus defragging the file without the torrent client being aware of BTRFS itself.

DrymarchonShaun commented 2 years ago

I didn't really understand your analogy, for one it "compares" subvols to block devs, which is a rather bad way of understanding it in my opinion because subvols aren't block devs, they are independently mountable filesystem roots.

The reason it didn't make sense to you was because I wasn't trying to explain subvolumes, I was trying to explain what's i'm guessing the thinking was behind opensuse's use of toplevel/@/* instead of toplevel/@* to delineate subvolumes from regular folders. It doesn't actually do anything system-wise, it's just a different way of making things easier for people to understand

cmurf commented 2 years ago

There's upstream systemd discussion on a "discoverable subvolumes spec" mimicking the Discoverable Partitions Spec, so that a systemd generator can create mount units based on subvolume names, rather than depending on: default subvolume, fstab, rootflags boot param, or database to figure out how to assemble systems.

Note that libvirt will chattr +C any directory configured as a storage pool when on Btrfs. So out of the box, any raw or qcow2 files will inherit this when copied or created new in e.g. /var/lib/libvirt/images. Any of virt-manager, virsh, virt-install, and GNOME Boxes will do this for their VM image pool location, since about 18 months ago. Details here. I'll argue applications that need this optimization should handle this rather than installers, because not every database needs nodatacow.

Forza-tng commented 2 years ago

I agree with @KhalilSantana. A flat structure for subvolumes rather than nested is preferable for many reasons. It is easier to understand, manage and roll back.

Though to be honest. I think that a simple default of just @home and @root is preferable. The other mount points @KhalilSantana mentioned can be defined as optional in the installer.

Also I do not think that any subvolume should have nodatacow by default. Nodatacow is a band-aid which reduces data integrity, even if the user is using a redundant raid profile.

KhalilSantana commented 2 years ago

While I don't personally use GRUB, lots of users do, and as such, I think it would be worthwhile involving the grub-btrfs developers here, so that package works with (some/all?) layouts provided by Archinstall.

In any case, this issue has tons of information that may be relevant/worth considering.

ImJustTheDesigner commented 2 years ago

So found this thread because am in the middle of transferring another system from one laptop to a new laptop and am wanting to move over my own favourite btrfs subvolume setup with @home on a separate partition, because it makes rebuilding/multidistro more easy. I used Manjaro Architect install back in 2020/1 because I could build my own btrfs structures (partitions and subvolumes) mid build and then the installer simply installed onto my layout.

Please follow starting from the base power of btrfs partitions through volumes and subvolumes through to snapshots:

Thank you for your time reading through these explanations. In conclusion from these we have that:

additionally @ is brilliant identifying a btrfs subvolume name making it easy to see and easy to search for eg: @ or @home or @varcache or @libVMs or @flatpak

Thus a btrfs subvolume layout in btrfs volume and mount layout could be:

/dev/nvme0n1p2 containing:

/dev/nvme0n1p3 containing:

mounting subvolumes as:

Add @description btrfs subvolumes that are relevant to each type of installation eg: @containers or @srv or @var_pkg

It would be great to be able to manually adjust to setup the btrfs subvolumes on their volumes with their mount points and mount settings from within the installer during the installation process. And maybe keep a record of the install btrfs subvolume and volume option choices made.

Thank you for you time reading these explanations and points. (from a user of a manjaro/mint/win/mac box [actually a laptop])

KhalilSantana commented 2 years ago

@ImJustTheDesigner

a btrfs subvolume enables mount directories to have their own filesystem rules separate from the rest of the filesystem being mounted with/without eg: compression or read/write (rw) or nodatacow

This is not possible currently. Read this section of the manpage (quoted below):

Most mount options apply to the whole filesystem and only options in the first mounted subvolume will take effect. This is due to lack of implementation and may change in the future. This means that (for example) you can’t set per-subvolume nodatacow, nodatasum, or compress using mount options. This should eventually be fixed, but it has proved to be difficult to implement correctly within the Linux VFS framework. -- Btrfs Documentation: btrfs(5)


a btrfs SubVolume is a type of directory created on a btrfs Volume that is seen as a Dynamic Partition that can be mounted just like any other type of partition, with an advantage for the btrfs subvolume partition/mount of being able to continuously vary the amount of space it takes up within the btrfs volume/partition

Your entire explanation of subvolumes treats it as a "dynamic partition", but that's not accurate. Partitions are block devices, btrfs subvols aren't block devices. Basically, BTRFS subvols is an entirely different beast than ZFS ZVOL or LVM's VGs, or a conventional /dev/sdXy partition.

I'll reiterate this previous comment of mine:

Subvolumes are independently mountable filesystem roots, from a user point of view they can be understood as directories with special semantics, and as such they share the same free space.

__

a btrfs subvolume snapshot is an in place backup in time of its own btrfs subvolume

A Snapshot is not a backup until you btrfs-send it somewhere else. Otherwise, all you have is a single copy of your data living on the same filesystem. If users treat it like a backup (ie: snap without send), then they are bound to lose data when their disk dies.

__

a btrfs subvolume snapshot rollback to a btrfs subvolume snapshot essentially changing the Live btrfs subvolume files back to what they were at the time of the snapshot/freeze, and having a snapshot immediately before the rollback may prove useful

No, to rollback you rename/move/change the default to another subvolume. The way you've described would be something like rsync-ing the snapshot data into the live subvol, which is not only super slow, but may cause data duplication, and weird behaviors due to applications seeing an inconsistent set of files (unmatching .SO files, etc).

When you rollback using tools like Timeshift. It replaces the current live subvol with the snapshot, and then you have to reboot into it. __

the btrfs subvolume snapshots are held within the btrfs subvolume itself ie eg: @ (for /) contains @/.snapshots/ (seen as /.snapshots when mounted) @log (for var/log/) contains @log/.snapshots/ (seen as var/log/.snapshots when mounted) @home (for home/) contains @home/.snapshots/ (seen as home/.snapshots when mounted)

This is also incorrect. You can btrfs snap /@ into /@home/foo and BTRFS won't give you any errors. You can place your snapshots wherever you want as long as it's still the same filesystem. __

Torxed commented 2 years ago

I've at the very least changed to the first suggested layout in PR #863. So far I've only done the basic tests on it, meaning:

If we're happy with this new layout as the default we can close this. If we want further changes, I'll keep this ticket open as there's a good dialogue going with great ideas.

cmurf commented 2 years ago

I think "root" and "home" subvolumes, mounted at / and /home respectively is a good enough approach for now, similar to what Fedora has used for a very long time. Meanwhile (open)SUSE discusses their subvolume layout issues, and a new approach, here.

As Fedora is considering a snapshot+rollback design, it's clear there are issues in /var that conflate very different file lifecycles, and files in /var may be managed by the package database. The location of the package database itself can be a thorn, as Fedora discusses whether and where to move /var/lib/rpm. Meanwhile (open)SUSE folks already chose /usr/lib/sysimage/rpm. The current discussion in Fedora further includes alternatives such as /state, /usr/var, and /usr/state but it is recognized that /var containing things not easily replaced during "factory reset" is a problem.

That is to say: subvolume layout, snapshot+rollback mechanism and domain, package management, what is "the system" versus what's adjunct to it, all form a circle. Everything has an effect on each other. So you pretty much have to game out the consequences of each decision and then iterate, to figure out what works the best.

KhalilSantana commented 2 years ago

@Torxed,

I've just tried to install on real hardware using the v2.3.1-dev branch (master didn't work) and while it does seem to create the subvolumes, it doesn't really mount or use them.

[khalil@milkyway ~]$ sudo findmnt --real 
TARGET  SOURCE              FSTYPE OPTIONS
/       /dev/mapper/luksdev btrfs  rw,relatime,space_cache=v2,subvolid=5,subvol=/
└─/boot /dev/sda1           vfat   rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro
[khalil@milkyway ~]$ cat /etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.

# <file system> <dir> <type> <options> <dump> <pass>
# /dev/mapper/ainstloop
UUID=cec3e0df-770e-4483-9fef-24cd0e760fa5       /               btrfs           rw,relatime,space_cache=v2,subvolid=5,subvol=/  0 0

# /dev/sda1
UUID=0ED8-B180          /boot           vfat            rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro   0 2

[khalil@milkyway ~]$ sudo mount -o subvolid=5 /dev/mapper/luksdev /mnt/
[khalil@milkyway ~]$ ls /mnt
@  bin  boot  dev  etc  @home  home  lib  lib64  @log  mnt  opt  @pkg  proc  root  run  sbin  @.snapshots  srv  sys  tmp  usr  var
[khalil@milkyway ~]$ ls -la /mnt/@
total 16
drwxr-xr-x 1 root root   0 Jan 14 21:11 .
drwxr-xr-x 1 root root 172 Jan 14 21:13 ..

Notice the findmnt command shows subvol=/ (aka: toplevel) as the mounted subvol for the rootfs, as well as the /@ subvol being empty.

So, it seems it didn't umount + mount with the subvol=<..> option in order for things to be in the correct place. For reference, here's how my script does it

Torxed commented 2 years ago

Good observation and thank you for the additional information. I'll see if I can figure out why today. Going on a one week holiday starting tomorrow so if not today I'll do it at the end of next weekend :)

mhussaincov93 commented 2 years ago

hi, I like the multiple options plan where in you are able to pick which layout you would like to use. personally I'm in favor of the layout that is as folows. "@" and "@home" which supports timeshift. thanks, Majid Hussain

Torxed commented 2 years ago

This is the final issue/suggestion I'll implement before releasing v2.3.1. So I'll put some effort into solving this today and tomorrow. @KhalilSantana I assume if we just get the mounting done correctly, this should be all good?

I would have hoped that systemd.mount would detect and mount it.

Would it be better to use a .mount target like:

[Unit]
Description="Virtual Machine and Container Storage"

[Mount]
What="/dev/disk/by-uuid/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
Where="/var/lib/machines"
Type="btrfs"
Options="subvol=machines,defaults,nodatacow"

Or is /etc/fstab still preferred?

Torxed commented 2 years ago

Just a note, think I found why the mount's doesn't pop up, at least in v2.3.1. And it's because we're never calling this anywhere: https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/btrfs.py#L12

~~I suspect it's a partial cherry-pick that's gone wrong. I'll see how we're using it in master and pull the changes over : )~~

Edit: scratch that, code moved to here in master at least https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/partition.py#L361

Torxed commented 2 years ago

@wllacer Sorry for pinging, but I'm trying to figure out the last issue before release. And I think it's because https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/btrfs.py#L12 is never called after creating the "mount structure" here: https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/btrfs.py#L177-L178

Is this a correct assumption? Note that I'm working with v2.3.1-dev here and not master where mount_subvolume is deprecated.

KhalilSantana commented 2 years ago

@KhalilSantana I assume if we just get the mounting done correctly, this should be all good?

@Torxed,

I didn't read the code (Python doesn't click for me sadly), but just going by observing the final output, Archinstall needs to:

title   Arch Linux
linux   /vmlinuz-linux
initrd  /intel-ucode.img
initrd  /initramfs-linux.img
options  rootflags=subvol=/@

The checked boxes are seemingly done/correct from my point of view, while the unchecked ones need to be addressed for a functional Arch system that works out-of-the-box with Timeshift (or anything else that expects the Flat layout with the Ubuntu-style naming scheme).


Regarding /etc/fstab vs systemd-mount units, as far as I'm aware, systemd-mount parses and generates its own .mountunits from fstab anyways[1], I personally haven't fiddled with it, so I'd rather keep everything on /etc/fstab and let systemd-mount figure the units by itself.

[1] - Try using sudo systemctl list-units | grep mount on a system which you didn't explicitly configure systemd-mount

Torxed commented 2 years ago

@KhalilSantana Thank you! We'll stick to /etc/fstab for now :) And I think we're on to a solution regarding why the mountpoint's aren't mounted properly. Stay tuned! : )

Torxed commented 2 years ago

I've created a draft PR that hopefully solves this. I suspect wllacer would solve this in 5 minutes. But I'll give it a go :) Good learning experience!

wllacer commented 2 years ago

@wllacer Sorry for pinging, but I'm trying to figure out the last issue before release. And I think it's because

https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/btrfs.py#L12 is never called after creating the "mount structure" here:

https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/btrfs.py#L177-L178

Is this a correct assumption? Note that I'm working with v2.3.1-dev here and not master where mount_subvolume is

Yes , I never needed to call it as I create standard partition dictionary entries in *managebtrfs... for each mountable subvolume since day one. There is (i believe) where I mounted / unmonted the root partition @KhalilSantana misses. In my machine I never ever got (with the standard layout) more than the root subvolume mounted, not even with the original code. I just assumed mount would silently "forget" them as they were already reachable. If you give me a few minutes, i'll do some tests and report back

Torxed commented 2 years ago

Thank you @wllacer. In my PR I've started to tinker with it. And I'm starting to understand the logic a bit more. I assumed manage_btrfs_subvolumes returned a flat structure of all the mountpoints that we needed to deal with, but it returns essentially {"/": {..., "subvolume" : {...}}.

Since the variable was called mountpoints I assumed it would contain all the mountpoints needed to be setup in a flat structure. I think that's what's missing tbh. If we can get manage_btrfs_subvolumes to instruct the following calls which mountpoints to setup, that would be it : )

I could have this done within the hour probably if you're busy. And the PR is almost done.

wllacer commented 2 years ago

Got the problem In btrfs.manage_btrfs_subvolumes https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/btrfs.py#L134-L136 look at the generated json

           },
            {
                "btrfs": {
                    "subvolumes": {
                        "@": "/",
                        "@.snapshots": "/.snapshots",
                        "@home": "/home",
                        "@log": "/var/log",
                        "@pkg": "/var/cache/pacman/pkg"
                    }
                },
                "encrypted": false,
                "filesystem": {
                    "format": "btrfs"
                },
                "format": true,
                "mountpoint": "/",        #here is the culprit
                "size": "100%",
                "start": "513MiB",
                "type": "primary"
            }

I never touched the part where the entry is generated (ny bad at user_guides and, as the partition entry has a mountpoint in the partition entry , all the procedures to create subvolumes don't work. I think i documented it somewhere, but never got to the code. The fastest patch should be https://github.com/archlinux/archinstall/blob/becb4239599c0fbb59315dc3beaacd8dc826d0c2/archinstall/lib/disk/user_guides.py#L36-L44 at line 42 something like

     mountpoint = '/' if not using_subvolumes else None

Can you do it ? Else I have to make a path for master and then you've to port it back

Torxed commented 2 years ago

I can try it : ) I think however we'll still have a bunch of issues after this fix. For instance, we should ensure that the subvolumes get mounted in the correct places.

I can't see how that would work with the current code that is in my PR.

Here's the current state:

Formatting /dev/loop0p2 -> btrfs
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=b84cb72e-70f1-4f0b-8192-593b797992a7, fs=btrfs) to /mnt/archinstall/
Creating a subvolume on /mnt/archinstall/@
Creating a subvolume on /mnt/archinstall/@.snapshots
Creating a subvolume on /mnt/archinstall/@home
Creating a subvolume on /mnt/archinstall/@log
Creating a subvolume on /mnt/archinstall/@pkg
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=b84cb72e-70f1-4f0b-8192-593b797992a7, fs=btrfs) as / to /mnt/archinstall/ using options None
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=b84cb72e-70f1-4f0b-8192-593b797992a7, fs=btrfs) to /mnt/archinstall/
Getting mount information for device path /mnt/archinstall/
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=b84cb72e-70f1-4f0b-8192-593b797992a7, fs=btrfs, mounted=/mnt/archinstall/) as /.snapshots to /mnt/archinstall/.snapshots using options subvolid=2
wllacer commented 2 years ago

I got it fine

└─/mnt/archinstall                /dev/loop0p2[/@]     btrfs      rw,relatime,ssd,space_cache=v2,subvolid=256,subvol=/@
  ├─/mnt/archinstall/boot         /dev/loop0p1         vfat       rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro
  ├─/mnt/archinstall/.snapshots   /dev/loop0p2[/@.snapshots]
  │                                                    btrfs      rw,relatime,ssd,space_cache=v2,subvolid=260,subvol=/@.snapshots
  ├─/mnt/archinstall/home         /dev/loop0p2[/@home] btrfs      rw,relatime,ssd,space_cache=v2,subvolid=257,subvol=/@home
  ├─/mnt/archinstall/var/cache/pacman/pkg
  │                               /dev/loop0p2[/@pkg]  btrfs      rw,relatime,ssd,space_cache=v2,subvolid=259,subvol=/@pkg
  └─/mnt/archinstall/var/log      /dev/loop0p2[/@log]  btrfs      rw,relatime,ssd,space_cache=v2,subvolid=258,subvol=/@log

PR #906 for master

wllacer commented 2 years ago

.snapshots should not be mounted according to the SuSe people, but is just fine ;-)

Torxed commented 2 years ago

Weird, it didn't work for me (Edit: just realized because I'm testing against v2.3.1-dev) But then again, I was messing with the code so much.

Where do we call mount -t btrfs -o subvolume=@/ /mnt/archinstall and all the other ones?

Torxed commented 2 years ago

I'll try to fix this in v2.3.1-dev as well, but since the logic is backwards compatible and not as heavily modified to support the complex nature of btrfs, it can get tricky to do so.

Currently I got:

Formatting /dev/loop0p2 -> btrfs
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=8180918a-a9b4-4b35-8cb2-0f84c0e47d0c, fs=btrfs) to /mnt/archinstall/
Creating a subvolume on /mnt/archinstall/@
Creating a subvolume on /mnt/archinstall/@.snapshots
Creating a subvolume on /mnt/archinstall/@home
Creating a subvolume on /mnt/archinstall/@log
Creating a subvolume on /mnt/archinstall/@pkg
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=8180918a-a9b4-4b35-8cb2-0f84c0e47d0c, fs=btrfs) as / to /mnt/archinstall/ using options None
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=8180918a-a9b4-4b35-8cb2-0f84c0e47d0c, fs=btrfs) to /mnt/archinstall/
Getting mount information for device path /mnt/archinstall/
Mounting Partition(path=/dev/loop0p2, size=19.5, PARTUUID=8180918a-a9b4-4b35-8cb2-0f84c0e47d0c, fs=btrfs, mounted=/mnt/archinstall/) as /.snapshots to /mnt/archinstall/.snapshots using options subvol=@/.snapshots
Getting mount information for device path /mnt/archinstall/.snapshots
Target /mnt/archinstall/.snapshots never got mounted properly (unable to get mount information using findmnt).