archlinux / archinstall

Arch Linux installer - guided, templates etc.
GNU General Public License v3.0
6.17k stars 536 forks source link

Make btrfs mount with `noatime` instead of `relatime` in fstab #582

Open TheEvilSkeleton opened 3 years ago

TheEvilSkeleton commented 3 years ago

It's worse to use relatime with btrfs due to the nature of CoW, and is only practical in edge cases. Using atime with btrfs can bring to performance regression, excessive use of storage and more.

I highly recommend reading through the LVM article Atime and btrfs: a bad combination? and GitHub comment https://github.com/kdave/btrfs-progs/issues/377#issuecomment-862920890 as they explain why using atime with btrfs is impractical for an average user.

Instead, we should be using noatime in the fstab.

Torxed commented 3 years ago

I don't mind changing if this is what btrfs guru's agree with is best for everyone. But the article is from 2012 and there's no mention about noatime in the Btrfs (Arch Wiki), so I'm torn. We follow the default parameters as closely as we can which means relatime probably will stay. I'm guessing the "bad idea" refers to excessive amounts of I/O when accessing a lot of files, which is a performance related topic which could argue that it's post configuration, not something we should dabble with here. We do go against the norm some times on purpose, and in this case that would probably be if we detected spinning disks, then this option might be the only really sane way forward for those users.

If we'd like to support snapper and mutt, we're also required to maintain some form of atime on our file history, otherwise they might break, although some of these tools will work with noatime as well. See #93 for progress.

Some users if not most, probably expects atime to be modified whenever the file is accessed, relatime will partially allow for this expected behavior to occur without being that insane with read operations. As I understand it, noatime removes atime meta data entirely, increasing performance but you loose the ability to indicate when a file was accessed which I wouldn't want to happen. Either BTRFS is not what you should be using, or it's a post configuration you should be doing in order to maximize performance knowing the "risks" involved.

I personally think this is a cost vs true data and that relatime should stay. I don't mind getting a new harddrive when the I/O gets dodgy in favor of being able to monitor atime on my file-system "accurately" or to add it to those systems I know has intensive I/O reads on unique files per day (read: machines that have "Anti Virus" software on them).

Ps. I'm allergic to adding even more user questions, but we could add a Do you want to optimize your filesystem? when BTRFS is detected and spinning disk is detected. But that's my two cents on the topic

TheEvilSkeleton commented 3 years ago

I opened an issue in the btrfs-progs repo: https://github.com/kdave/btrfs-progs/issues/377 so we can have a clear answer.

As for the Arch wiki, I think it needs to be updated. I checked in the btrfs wiki, and two sections talk about it:

  1. https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#NOTES_ON_GENERIC_MOUNT_OPTIONS
  2. https://btrfs.wiki.kernel.org/index.php/FAQ#Why_I_experience_poor_performance_during_file_access_on_filesystem.3F

I'm okay to update it as long as we have a clear answer to this topic.

Re Ps. btrfs is really configurable, and for that, I think there should be more questions, like transparent compression (which Fedora recently enabled by default), discard=async (periodical TRIM) for SSDs, which is better than discard according to the Arch wiki, etc.

polarathene commented 3 years ago

I'm guessing the "bad idea" refers to excessive amounts of I/O when accessing a lot of files

Not really... from the article:

the real problem has to do with the interaction between atime, copy-on-write, and snapshots. Alexander posted an example where a recursive grep caused 2.2GB of free space to disappear. That is a surprising result for what is meant to be a read-only operation.

On other filesystems, that's not really a concern AFAIK? Note that this only applies the first time on the most recent snapshot, since upon creation it's sharing extents with another prior snapshot(s), it's only when the atime changes metadata that the CoW behaviour kicks in to make new copies from what I understand.

If the system is making more frequent snapshots however, that issue could become a noticeable problem. It's not difficult to run into scenarios where many files get accessed, either as a developer or a more casual user.

That said, as the btrfs-progs issue mentions, the issue can still happen with actions that perform writes on many files too (chmod -R and git clean are given as examples). noatime would not prevent those affecting the user in a similar way, but the scope may be on a smaller scale and less frequent (no random unexpected bulk update from relatime 24 hours later from a big read, which is said to negatively risk impacting responsiveness depending on IO pressure).

On the performance side however, relatime vs noatime was noted as having only ~25% of the expected I/O perf when transferring files from a BTRFS disk to an external USB disk (formatted with XFS). noatime resolved the perf issue, is access time of a file that important of a trade-off? Personally I only care about modification time, not when it was accessed, especially on a personal system.


Some users if not most, probably expects atime to be modified whenever the file is accessed, relatime will partially allow for this expected behavior to occur without being that insane with read operations.

Are you sure about that? If anything, considering the few examples that ever arise when atime is relevant such as mutt, you'd think that users of such may be more niche and it should be those users that should know better to enforce their preference of strictatime or relatime (possibly with lazytime) to meet their expectations? For everyone else it's not really providing much benefit and otherwise seems to contribute to the system negatively?


there's no mention about noatime in the Btrfs (Arch Wiki), so I'm torn.

There is this section which does warn against atime updates impacting performance and disk usage, advising to reduce it (relatime may be seen as a reduction, but noatime would also be a valid reduction).

The BTRFS wiki also mentions this:

Further, there's the well known issue, that when having many snapshots and noatime is not in place, than accessing will cause atime uptdates which is followed by a lot of metadata writes, that could potentially eat up quite some space. For that reason it may make sense to have snapshots subvols marked with a noatime property per default, again offering a switch not to do so. Whether it makes sense for ro-snapshots, depends on whether ro-snapshots have atime updates at all.


If we'd like to support snapper and mutt, we're also required to maintain some form of atime on our file history

Do you have a source for the snapper requirement of atime? It wasn't in your linked issue at all, nor is it in the ArchWiki Snapper page, very little came up at a glance through a search engine query as well.

Either BTRFS is not what you should be using, or it's a post configuration you should be doing in order to maximize performance knowing the "risks" involved.

One could argue that should apply to those that would want atime updated? Linux defaults to what it does to try avoid breaking compatibility AFAIK, even when that default may not make much sense to retain as the default for performance and disk usage concerns..


Regarding other mount options like discard=async, Fedora working group discusses some of those and their suitability.

Speaking of Fedora, I haven't checked Fedora Workstation or openSUSE defaults. If either of those default to noatime as a mount option for their default BTRFS setups, that'd surely hold some weight?

Torxed commented 3 years ago

It's valid opinions, and I would just like to once more clarify that I'm not against adding this change but we do strive to keep it as close to defaults as possible in order to avoid maintenance hell. Regarding Fedora, openSUSE and any one else - this project doesn't really look at those projects. We tend to follow the Arch wiki as our bible and any defaults in the PKGBUILD's. But we do occasionally gather inspiration from other sources of course, this is natural in any development process.

I think the way forward for this is to present the user with one additional question after selecting BTRFS as the filesystem of choice and the question would be optional and default to no, and would sound something like:

Would you like to optimize BTRFS performance and change the default parameters? (N/y): 

This is probably where I would like to add BTRFS subvolumes and other things as well. As I think most inexperienced users might find all this "a bit too much" to handle in one dump if they're trying this out for the first time. The flip side of that is that most Arch users are tech savvy and interested in new cool things - and it might save them data when getting help from a professional. To me this boils down to tradeoffs and should we follow the default parameters in all cases, some or not at all.

Because I still think the "next next next install" logic that we currently support for speedy, minimal installs should adhere to the default configurations as much as possible. If and when the defaults change we'll also change. Monitoring those changes is hard with a project like this - so feedback from the community is important.

Btw, good read on the issue link @TheEvilSkeleton!

polarathene commented 3 years ago

As I think most inexperienced users might find all this "a bit too much" to handle in one dump if they're trying this out for the first time.

Aren't these users the kind that are less likely to have niche software that requires atime, and more likely to accidentally run into the problems it can cause on BTRFS?

I agree that UX for that type of audience should keep the install streamlined and simple. These are users that aren't necessarily as likely to want upstream defaults, as they are to be the kind that want sane defaults without having to invest time learning about an unknown amount of gotchas, especially ones like this which are more likely to be experienced and a result of not knowing any better.

The flip side of that is that most Arch users are tech savvy and interested in new cool things

I fall into that category, and while I do have preference to Arch for sticking to defaults so I don't have another layer of things to be aware of (especially when troubleshooting), I also can't say I've been fond of winding up with bad defaults and being forced to learn about something minor that I didn't have time for (to properly grok) until it's become an actual problem and interrupted my task at hand unexpectedly.

I ended up switching to Manjaro early on after enough of those until I got more confident over time working my way through the ArchWiki and other resources. Manjaro strays a bit too much for my liking, but it was nice to put some trust in a community/maintainers that knew what changes from defaults would generally be favorable to the users.

I could then look into why those choices were made afterwards for my own benefit, without worrying so much about stuff going wrong (not that Manjaro was immune to that, especially with AUR usage).


To me this boils down to tradeoffs and should we follow the default parameters in all cases, some or not at all.

I'm not presently a user of archinstall, though I have heard of it in a few places. Forgive me if this suggestion is poor from not having given the project a proper look..

Is this not something an ENV var or config can set as a profile (or part of one)? Vanilla Arch defaults or Community defaults, with the latter providing some overrides that can be documented as to why they were chosen? (just provide some URL to a markdown document in this repo that lists each one that gets adopted)

Because I still think the "next next next install" logic that we currently support for speedy, minimal installs should adhere to the default configurations as much as possible.

Another case where the vanilla vs community defaults keeps both parties happy? For me especially, having a clear document that's version controlled to reference of what config defaults differ from upstream and ideally the reason for that is quite valuable. Most concerned about steering away from defaults is likely to be due to either expectations or not understanding a decision well enough to feel comfortable about steering away from defaults.

Some that expect defaults may also be due to them applying their own changes post-setup, I know that with Manjaro it's customizations can also be unwanted, thus you have to undo them, but their changes can be more frequent that reliably slower changing defaults of a vanilla/upstream install.

I see at a glance that the project has JSON configs and python profiles (though tailored to different purpose), perhaps it's just a matter of expanding the JSON config support and providing a community override for some defaults that can be adopted? (others could then tweak that as needed if they prefer a slightly different variation)

polarathene commented 3 years ago

One more resource, again a Fedora discussion (but specifically regarding defaulting to noatime).

It was generally in favor, but then Lennart of systemd chimed in with some weight, among some other feedback for why it's probably not wise for their demographic.

As I'm interested in snapshots and not really seeing any major concerns with noatime despite reasons for keeping atime that that link addresses; I guess leaving the default as-is and taking explicit action to change it is best for now.. unless more demand for the change pours in from users of archinstall preferring that community defaults approach.

Torxed commented 3 years ago

The overrides are a good idea. Even with sub-menus being hidden behind a Would you like to optimize BTRFS performance and change the default parameters? (N/y):, you can still override the values in that menu without ever entering it by simply doing (pseudo flag at the moment) --btrfs-subvolumes and it will setup sub volumes without asking about it. I guess we could add a --optimize-btrfs-mounts and even if the user opt out of other btrfs optimizations in the sub-menu, the mount optimization could be added but subvolumes skipped for instance. The JSON config supports a similar "override" which is nice. Python profiles that users specify can always override things, but we could/will add a convenient API for doing so. Currently you would have to use SysCommand() or file handles to modify things in place.

So the ENV approach would be the least invasive, together with a detect-and-ask approach for btrfs specifically.


I would also suggest submitting a ticket upstream (Arch) to change the defaults to more sane ones if this is something that should be a broader fix, not just for archinstall. As the argument seems to be in favor of this and very little feedback given against it (maybe those against is just not interested in the debate, who knows).

polarathene commented 3 years ago

TL;DR


Original Message > I would also suggest submitting a ticket upstream (Arch) to change the defaults to more sane ones if this is something that should be a broader fix, not just for archinstall. From what I understand, kernel refusing to change defaults has the advice to distros to override them.. and perhaps some bigger distros do this, but I'm also thinking that isn't something Arch would change and more expected of upstream Arch to not tamper with? archinstall though... just seems like choosing more sane defaults would be more appropriate here. As mentioned though, `systemd-tmpfiles` may be more of a valid use case to want atime. > As the argument seems to be in favor of this and very little feedback given against it (maybe those against is just not interested in the debate, who knows). I don't follow? This issue is a few days old with two users expressing preference to `noatime` and providing references to support it. Is it common for opposition to get involved that early on in the projects issues? I've tried to avoid bias and mention arguments in favor of atime where possible with my input. Were you also weighing in the input from the other linked resources we've provided? You would need enough users running into the problem first (_which assumes archinstall provides an easy and popular enough flow for enough users to be affected_), the demographic may raise issues about it, but since they're also quite possibly tech savvy, some may already take additional effort to get `noatime` sorted out regardless? --- With `noatime`, anyone that cares about atime will probably notice sooner, compared to a user unknowingly wasting diskspace with snapshots on metadata with atime updates? If stuff like `systemd-tmpfiles` does result in a bad experience from a lack of atime updates (_Lennart was even encouraging `strictatime` be used instead of `relatime`.._), then perhaps that can be a positive thing? :sweat_smile: At least it doesn't sound like it'd be as problematic, and easier to resolve? (_changing the mount option in fstab vs cleaning up N snapshots which may have other changes to separate from the atime ones_) Either way, archinstall users will chime in with any issues that they care about enough and the default can favor whatever makes the most sense for the community :)
dylanmtaylor commented 3 years ago
Would you like to optimize BTRFS performance and change the default parameters? (N/y): 

This sounds like something we want to keep out of the installer. It'll probably lead to user confusion. They can always tune params after installation and we ought to try to ship some sort of sane default. I'm not knowledgeable enough about these options to be too opinionated.

TheEvilSkeleton commented 3 years ago
Would you like to optimize BTRFS performance and change the default parameters? (N/y): 

This sounds like something we want to keep out of the installer. It'll probably lead to user confusion. They can always tune params after installation and we ought to try to ship some sort of sane default. I'm not knowledgeable enough about these options to be too opinionated.

The whole situation of btrfs is confusing because there are a lot of mixed opinions. So many people say that btrfs is slow, yet there are many improvements to be made, e.g. noatime, transparent compression, discard=async, etc. Many people say that btrfs is unstable, yet Google, Facebook and many other companies have been using btrfs on their servers for many years, etc.

btrfs, for now, can only be well done if properly configured. openSUSE for example is likely to be the best distribution for btrfs because everything is pre-configured, like snapshots. If we keep it stock, then it's only a worsened ext4 because the defaults don't take advantage of btrfs' features while also using an atime where it is only useful for edge cases but has far bigger disadvantages, in my opinion.

I don't know whether this will bring more confusion to the user or not. I think having an option to optimize btrfs is a great idea, but again, I don't know if it'll make it easier for the user or bring to more confusion. As said before, the whole situation is confusing because of the mixed opinions from the Linux and BSD communities, and lack of information and the absurd amount of misinformation (like Phoronix benchmarks of btrfs vs other FSes) which caused a lot of confusion, and conclusions of the wrong idea.


I would also suggest submitting a ticket upstream (Arch) to change the defaults to more sane ones if this is something that should be a broader fix, not just for archinstall. [...]

Done! https://bugs.archlinux.org/task/71305

polarathene commented 3 years ago

The whole situation of btrfs is confusing because there are a lot of mixed opinions. So many people say that btrfs is slow

This is also due to historical bias where BTRFS was notably slower in performance and had numerous issues that have since been addressed. Some users have negative experiences and that sticks with them when discussing preferences.

Many people say that btrfs is unstable, yet Google, Facebook and many other companies have been using btrfs on their servers for many years, etc.

Beyond actual issues in the past, you do need to consider the skillset and investment of time that make a difference in ensuring a positive experience. Server needs and experience can also differ from desktop.

Many users may have more positive experiences when sticking with the defaults provided (eg by Fedora or openSUSE at install time), while some users would use non-default features and run into unfortunate bugs (I think compression paired with another feature at one point was causing some data corruption).

I know in my own experience years ago I had problems with openSUSE defaults on an SSD (bug was related to SSDs) and my use of Docker (which had another bug with BTRFS at the time itself). This isn't surprising in Linux though, I've had an LTS kernel with a non-default disk I/O scheduler (BFQ) cause a daily kernel panic when used with XFS, that was reported early by multiple users but not resolved for many months.

BTRFS has had plenty of gotchas, some which are just from being different from the more common EXT4 and XFS filesystems, or how LVM and RAID work with their similar features that BTRFS has built-in but does differently. In my experience, users can neglect to consider the differences and complain when things don't work out like they expect.


If we keep it stock, then it's only a worsened ext4 because the defaults don't take advantage of btrfs' features while also using an atime where it is only useful for edge cases but has far bigger disadvantages, in my opinion.

If you're not making much use of subvols and snapshots, you're not likely to have an issue with atime updates. Fedora apparently hasn't gotten to that point yet and expressed that their simpler setup doesn't really have much of a problem in practice with relatime.

They are interested in adopting more of the BTRFS features, including making better use of subvols and snapshots, so that opinion may change in future. For now though they opted against noatime by default.


I don't know whether this will bring more confusion to the user or not. I think having an option to optimize btrfs is a great idea, but again, I don't know if it'll make it easier for the user or bring to more confusion.

The user either understands what's going on or places trust in any optimizations because they don't have time or interest to know any better.

It doesn't need to be confusing. Provide what the demographic would expect (vanilla upstream defaults, or sane community advised ones), the other can be opt-in in whatever manner makes the most sense for the demographic (contextual prompt, CLI flag, config setting).

Any mention of the community defaults can just reference a markdown document on this repo that explains any changes made for those that are interested. Fedora has separate issues on Pagure specifically to discuss and detail such decisions.


lack of information and the absurd amount of misinformation (like Phoronix benchmarks of btrfs vs other FSes) which caused a lot of confusion, and conclusions of the wrong idea.

That's a user problem. If you just look at metrics without context, you aren't getting useful insights to form an opinion on. Unless you're aware that Phoronix sticks to defaults of whatever is being tested, regardless of if that's good or not, and that's how you want to evaluate a decision without any other factors taken into account.

polarathene commented 3 years ago

From https://github.com/kdave/btrfs-progs/issues/377#issuecomment-864837682 a lengthy discussion concluded that on BTRFS noatime is most preferable to use by default. I've copy/pasted the same linked summary below.


I will try to tersely summarize the problem and advice for someone making a decision:

Is atime a bad idea with BTRFS? Should I prefer noatime?

noatime is often a better choice.

This is only relevant when a file is read and the metadata is shared with at least one snapshot. Updating the atime in metadata then results in requiring disk space for a new copy of metadata to accommodate the change, while keeping the old atime for any snapshots that still reference it.

With enough files read or snapshots over time, it can become a noticeable amount of disk space wasted. Using noatime can avoid this. Although if the affected snapshots are disposable / short-lived, you can keep the default relatime and just delete older snapshots when you need the space.

It is easier to switch from noatime to relatime/strictatime (avoid lazytime as it risks data consistency guarantees) than it is to switch the other way around (due to snapshots "polluted" with metadata containing only unwanted atime updates).

More verbose version ### The _atime_ + _snapshot_ disk usage problem - Metadata for files on BTRFS can be shared across multiple snapshots efficiently when it's identical. - Reading a file can trigger an atime update write to the metadata on disk. - Modifying the metadata loses the benefit of sharing with snapshots, CoW causes new writes to update the metadata, using up extra disk space. - If the file metadata to be updated is not shared with any snapshot(s), no extra disk space is needed to write the update. - The disk space can be recovered when all snapshots referencing the same copy of metadata are deleted. - The problem is the old metadata is tied to any other content in your snapshots that may be of value to retain, there is no way to clean up space by removing only the redundant metadata in snapshots. - Thus the problem compounds if the frequency of atime updates affects many snapshots over time (as in each snapshot has different metadata for a file due to different atimes). ### `relatime`, `strictatime`, `lazytime` vs `noatime` - `relatime` minimizes atime updates (_and thus their precision_) which is beneficial with frequent use of snapshots. - If snapshot intervals are daily or greater, `strictatime` can provide more atime precision at the expense of writing to disk with each file read. - Many atime updates for the same file(s) does not compound the disk usage increase if there is no new snapshots taken. Only the first atime update for a file incurs the penalty from no longer being able to share the same data with a snapshot(s). - `lazytime` behaviour does not affect writes on BTRFS, and does not provide much benefit over `relatime` for reads. Additionally it can risk consistency guarantees that BTRFS is intended to provide for filesystem state. `noatime` avoids the problem since no atime updates occur. Some software like `systemd-tmpfiles` may not function as optimally/accurately, but otherwise most users aren't likely to notice.
Torxed commented 3 years ago

From kdave/btrfs-progs#377 (comment) a lengthy discussion concluded that on BTRFS noatime is most preferable to use by default. I've copy/pasted the same linked summary below.

The only concern I have, is why isn't this the default then?

I'm not trying to be a stopper with this question, I'm genuinely just interested why this sane default isn't the default - everywhere. Does it solely boil down to polluted opinions and misinformation over time that's grown to be a fear?

polarathene commented 3 years ago

The only concern I have, is why isn't this the default then?

Kernel devs won't change any defaults afaik because of some backwards compatibility thing. Maybe when we have a 6.x kernel I guess?

That's what I recall as being cited the reason however even back in the old 2012 article before 4.x kernel. Perhaps they're keeping it consistent for all filesystems, I dunno. They delegate the defaults to be overriden by distros, but Arch is known to stick to upstream defaults as much as possible afaik?

While an often cited example for atime usage is Mutt (which I think doesn't depend on it to function correctly anymore?), there's also Lennart of systemd encouraging atime/relatime for systemd-tmpfiles, how important his concerns are sort of depends on the user, but he probably holds some weight. We also have to keep in mind that the kernel has various users with notable ones being server and desktop, it may not be considered that important to have noatime for servers by default.

Does it solely boil down to polluted opinions and misinformation over time that's grown to be a fear?

Probably just can't make a default satisfy everyone and no one with that much weight is advocating for noatime, the BTRFS linked discussion didn't have an official BTRFS maintainer chime in AFAIK.

Torxed commented 3 years ago

Good explanation and sounds valid. The "Arch is known to stick to upstream defaults" is a value that ticks my boxes and something we try to adhere to. I'm not closing this, but I will park it for a bit to make sure if we consider implementing this - it has to be done right! I have a sneaking suspicion that people (just like with LUKS), will start asking "why do you do this?". This thread will serve as a response to those. We could add a notification at the end that the users should read up on optimizations in the meantime, since this can be done post install.

BTRFS subvolumes for instance has to be done during setup because you can't really much around with the filesystem once everything is in place (you can, but it's a lot more hassle than changing a mount option for the most part).

tl;dr: I will revisit this in a bit, need to think about how this would be implemented properly if we do implement it.

uhthomas commented 1 year ago

fwiw the arch wiki does have a section on reducing access time metadata updates. It would definitely be sensible to use noatime as a default, especially for SSDs where writes are finite.

https://wiki.archlinux.org/title/Btrfs#Reducing_access_time_metadata_updates

edit: Sorry, taking a closer look I do see this mentioned earlier in the thread. I guess this comment can be considered support for the request.

C0rn3j commented 6 months ago

I understand trying to keep to upstream defaults and very much agree with the philosophy, but trying to keep upstream KERNEL defaults is not always a good idea, since the kernel works with the assumption that user space should NEVER be broken, leading to some poor defaults because of niche use cases (such as this one), and it makes sense to change those to values the majority would benefit from.

Arch Linux modifies kernel defaults where reasonable too if the kernel maintainers won't change it after a proposal.
See the recent vmmax_map_count increase.

The linked thread above that @TheEvilSkeleton created has many new comments with examples where relatime volumes suffer, and the linked comment from 2021 now has a 2024 updated one with the same conclusion - default to noatime unless you have a specific reason not to.

@Torxed noatime by default without user prompting for archinstall, please.