Open poelzi opened 2 years ago
Hmm, so according to the flag documentation this is recommended for desktop usage:
"Select this if you are building a kernel for a desktop system."
https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re152.html
Yes. I'm not sure how much my nvidia card plays also a role in this mess of a pc, but switching to a different preemtive setting just feels so much smoother. We should at least provide a linux kernel derivative that is with desktop settings and warn the user if zfs is enabled and the default kernel is used or document to switch kernels.
Using the rt kernel is not always an option unfortunately. The nvidia driver doesn't like it and the open driver is just not good enough on hidpi multimonitor setups. When I need super low latencies, I use cpuset cgroups to isolate one core and disable hyperthreading on that. Then move the audio thread there. This is good enough ;)
@Mindavi https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re153.html is the good option
i am also super interested in having a responsive system with zfs. when my system gets over some level of load the amount of short lags and sound stuttering (esp. on recording) heavily increases. Before nix i was on an arch install and at least i havent noticed similar things before. Got a 2700X on the machine here, which should be able to handle my workloads very easy. Sadly i did not (yet) take a deep dive on the issue. At least what i could say when i had tried it, the zen kernel did serve less stuttering, or at least i did notice less of it (but like said havent done benchmarks or something to measure).
But at some point i switched back as i currently dont understand how to keep zfs modules and zen kernel in sync, and the zfs module has a nice variable for pointing to compatible kernel versions, which is useful for newbies like me. :)
Ubuntu's kernel, which officially supports ZFS is compiled with the same options.
$ uname -a
Linux ubuntu 5.13.0-39-generic #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ awk '/^#/ { next } /PREEMPT/ { print }' /boot/config-5.13.0-39-generic
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
Are you sure it's not something else on your system?
Of course this is anecdotal but I have not had latency problems running ZFS on a desktop. (with sufficient RAM!) But then again, I'm not doing realtime audio either.
for me personally i would not bet on that's exact that setting. i noticed sound shuttering in situations of higher IO load, when having higher memory pressure (eG having ~ 30% free memory oh the system, which has 32G), which i did not have on my old installation. So thats just observation on my side but sounds similar to what OP has described on the upstream ticket.
But for me the combination of having memory pressure AND IO load is needed for that to happen, which differs a otc to OPs description in detail or better, adds an additional layer which may be the root cause.
Sadly i hadn't yet the time for a kernel compilation with the suggested settings to check if that also "fixes" my issue, or if its something different, but a system which freaks out on IO when having "just" 10G RAM left free sounds like a bad experience for me... :)
for me personally i would not bet on that's exact that setting. i noticed sound shuttering in situations of higher IO load, when having higher memory pressure (eG having ~ 30% free memory oh the system, which has 32G), which i did not have on my old installation. So thats just observation on my side but sounds similar to what OP has described on the upstream ticket.
But for me the combination of having memory pressure AND IO load is needed for that to happen, which differs a otc to OPs description in detail or better, adds an additional layer which may be the root cause.
Sadly i hadn't yet the time for a kernel compilation with the suggested settings to check if that also "fixes" my issue, or if its something different, but a system which freaks out on IO when having "just" 10G RAM left free sounds like a bad experience for me... :)
You can try Liquorix, Xanmod or Zen kernel patch sets to test, they are all available on nixpkgs, note that support from ZFS or NVIDIA may lag a bit.
I run ZFS in some of my machines mainly with Xanmod or Zen and I do not see any lag, only when I am with heavy IO operation on some old HDDs.
You can try Liquorix, Xanmod or Zen kernel patch sets to test, they are all available on nixpkgs, note that support from ZFS or NVIDIA may lag a bit.
As mentioned in my very first post, i did try ZEN at some point, and that gave much better experience. but i switched back to stock, as i find config.boot.zfs.package.latestCompatibleLinuxPackages
to be very handy to ensure to have a compatible kernel installed. And it this is sadly no ZEN.
How do you ensure that zfs modules are in a valid version, or is that something i dont have to care to much? As i am using ZFS on root Its crucial to have it working at the end.
About the user experience on some IO, when scrubing kicks in (which is of course high IO and some lag is expected) but the system is not even usable any longer as its mostly unresponsive. The zfs pool is hosted on a Samsung SSD 860 EVO which is not really some old HDD.
Not sure if its really the same "issue" poelzi bothers or described, as this is not really near RT scenarios, and i dont want to overtake the issue here, with my "problems", but a short feedback how you properly handle the kernel matching zfs would be handsome.
edit: i now understand how the kernelPackages are tied to the kernel in nixos, so question is now cleared
Scheduling under load is a difficult problem to solve.
Can you isolate where the block latency is coming from? iostat -x
and zpool iostat -vl
are good debugging tools to identify if the latency is coming from the kernel or the device.
I somehow forgot about the issue and was today talking about it. I nailed it quite clear down to write IO by sending around zfs datasets in my network. When the affected PC is the sender everything works fine, as soon as it its the receiver sound sometimes begins to shutter.
The iostats look like the following:
Every 2.0s: zpool iostat -vl pointalpha: Tue Feb 14 18:09:27 2023
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
--------------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
rpool 461G 467G 28 82 535K 4.89M 1ms 163ms 384us 754us 197us 1s 4ms 12ms 2ms -
ata-Samsung_SSD_860_EVO_1TB_S3Z9NB0K403903D-part2 461G 467G 28 82 535K 4.89M 1ms 163ms 384us 754us 197us 1s 4ms 12ms 2ms -
--------------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Every 2.0s: iostat -x pointalpha: Tue Feb 14 18:11:02 2023
Linux 6.1.7-xanmod1 (pointalpha) 02/14/23 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
19.00 0.01 3.80 0.19 0.00 77.01
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 30.66 540.40 0.06 0.20 0.45 17.63 84.26 5212.12 0.79 0.93 0.57 61.86 0.00 0.00 0.00 0.00 0.00 0.00 1.21 2.89 0.07 3.56
sr0 0.00 0.00 0.00 0.00 6.00 2.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zd0 0.01 0.25 0.00 0.00 0.13 27.62 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zd16 0.01 0.18 0.00 0.00 0.08 24.87 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I am absolute no expert in reading this but i see a high f_await
on iostat
and high syncq_wait
on zpool iostat
.
When another machine is the receiver (where i sadly can not test the behavior as its a server and i dont know how to verity it) those two numbers are a lot lower.
But i am not sure how to interpret the numbers to be honest.
From %util
the device should be chilling.
> Every 2.0s: zpool iostat -vl pointalpha: Tue Feb 14 18:09:27 2023
>
> capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
> pool alloc free read write read write read write read write read write read write wait wait
> --------------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
> rpool 461G 467G 28 82 535K 4.89M 1ms 163ms 384us 754us 197us 1s 4ms 12ms 2ms -
> ata-Samsung_SSD_860_EVO_1TB_S3Z9NB0K403903D-part2 461G 467G 28 82 535K 4.89M 1ms 163ms 384us 754us 197us 1s 4ms 12ms 2ms -
> --------------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
>
163 milliseconds to do the write, when ~1 millisecond the actual drive latency. That's clearly CPU bound. On non-preemptive kernel all of this are running in high priority zfs io threads doing a lot of compression/encryption/checksumming. I wonder, what is the recordsize, compression algorithm, encryption settings for the system? It might be not enough cond_resched() calls in one or more of the corresponding code paths or zfs kernel thread priority is too high for the audio thread to be able to preempt it.
Before switching to preemptive kernel I was playing with zfs module parameters like the following with unconsistent level of success:
spl.spl_taskq_thread_bind=0
spl.spl_taskq_thread_priority=0
This should be set on the module load I think. Binding the threads may be especially bad for audio as usually only one cpu core handles audio interrupts and if zfs occupies that core and doesn't preempt in time it will cause stuttering.
163 milliseconds to do the write, when ~1 millisecond the actual drive latency. That's clearly CPU bound. On non-preemptive kernel all of this are running in high priority zfs io threads doing a lot of compression/encryption/checksumming. I wonder, what is the recordsize, compression algorithm, encryption settings for the system?
zfs get recordsize,compression,encryption rpool
NAME PROPERTY VALUE SOURCE
rpool recordsize 128K default
rpool compression zstd local
rpool encryption off default
It is kept on default beside using zfs as compression.
Before switching to preemptive kernel I was playing with zfs module parameters like the following with unconsistent level of success:
spl.spl_taskq_thread_bind=0 spl.spl_taskq_thread_priority=0
This should be set on the module load I think. Binding the threads may be especially bad for audio as usually only one cpu core handles audio interrupts and if zfs occupies that core and doesn't preempt in time it will cause stuttering.
ill try out if that results in a better experience.
edit:
Most of my bad UX has been resolved by setting
udev.extraRules = ''
ACTION=="add|change", KERNEL=="sd[a-z]*[0-9]*|mmcblk[0-9]*p[0-9]*|nvme[0-9]*n[0-9]*p[0-9]*", ENV{ID_FS_TYPE}=="zfs_member", ATTR{../queue/scheduler}="none"
'';
Which i found here:
EDIT 2 The udev change is in nixos since 23.11
@poelzi Have you tried the proposed solution? Can we close the issue?
@poelzi Have you tried the proposed solution? Can we close the issue?
I've tried it before it was merged and even after and it didn't make a difference for me.
My principal problem is using atuin with a zfs root where the shell would hang while atuin does an insert in SQLite. It's been already referenced above.
The real-time patch at the top made the most difference but still I see it from time to time.
This is actually ZFS bug causing ftruncate hangups (affecting everything using sqlite as a database, not just atuin
) as noted in https://github.com/atuinsh/atuin/issues/952#issuecomment-1537884120 which links to https://github.com/openzfs/zfs/issues/14290
there are 2 "fixes" so far:
tmpfs
and synchronize them (with litestream
?) to persistent storage as described in https://github.com/atuinsh/atuin/issues/952#issuecomment-1645436046sync
on the dataset holding sqlite database https://github.com/atuinsh/atuin/issues/952#issuecomment-1783676117I'm experiencing the same issue -- though (as far as I can tell) not only with Atuin, also Firefox, Konsole, KDE Plasma, etc... for reference, I have compression, deduplication, and encryption all disabled. Neither setting autotrim=on
nor adding boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages;
to my configuration seems to have made any difference. I would be happy to create a new dataset with sync=disabled
for specific applications if I could isolate them, but it seems at this point that the issue is system-wide.
Some other things I have tried (unsuccessfully) -- I'm interested in hearing if any of these worked for others, and what other options I should consider:
boot.kernelParams = [ "elevator=none" ];
sync=disabled
for all datasets and rebootingudev
modifications mentioned by Shawn8901boot.kernelPackages = pkgs.linuxPackages_zen;
spl_taskq_thread_bind
and spl_taskq_thread_priority
At this point I am beginning to doubt my problem is with ZFS itself (or the desktop environment for that matter), though I'm not sure where else I should be looking.
* the `udev` modifications mentioned by Shawn8901
fyi in case someone else comes around the udev changes, they are applied by default since https://github.com/NixOS/nixpkgs/pull/250308 (https://github.com/NixOS/nixpkgs/issues/169457#issuecomment-1705486693) that should be in nixos stable since 23.11. So for that part there should not be the need to do any manual changes.
Thanks, I wasn't aware of that. Rather embarrassingly, the main issue for me turned out to be a power-saving setting my laptop had automatically enabled without me noticing. Opening files/applications still lags sometimes, but it usually resolves itself after the first time (so I assume this is caching-related).
Opening files/applications still lags sometimes, but it usually resolves itself after the first time (so I assume this is caching-related).
That could be https://discourse.nixos.org/t/plasma-emojier-too-slow-episode-iv/40130, so it might be fixed in Plasma 6.
I've also ran into this. Turned out I had battery saver on.
Describe the bug
ZFS on a desktop system with default kernel which is compiled with PREEMTIVE_VOLUNTARY causes a system with terrible lagg, short hangs and very bad realtime behaviour. This is easily so see with jackd and mixxx for example.
If the kernel is compiled with these changes, the system behaves much better:
Steps To Reproduce
Steps to reproduce the behavior:
Expected behavior
More behaviour similar to other filesystems
Additional context
Upstream ticket: https://github.com/openzfs/zfs/issues/13128
Notify maintainers
@wizeman @hmenke @jcumming @jonringer @fpletz @globin
Metadata