fedora-silverblue / issue-tracker

Fedora Silverblue issue tracker
https://fedoraproject.org/atomic-desktops/silverblue/
126 stars 3 forks source link

Poor performance with dm-crypt on SSD (was Application not responding under heavy load) #388

Open francoism90 opened 1 year ago

francoism90 commented 1 year ago

This issue tracker is intended only for Silverblue specific issues. We would like to ask you to try to reproduce the issue on a relevant Fedora Workstation release. If you will be able to reproduce there, then please report it in Red Hat Bugzilla (see How to file a bug) or in upstream (preferred for GNOME projects) and not in this issue tracker.

Describe the bug When doing something like extracting (unrar) or downloading on the root FS, it seems to freeze the system. It does seems to root FS needs some time to be able to catch up and causing to blame other applications, like browsers which aren't doing anything.

To Reproduce Please describe the steps needed to reproduce the bug:

  1. Use Steam and download a game
  2. Freezes, 'Application X not responding'.

Expected behavior No freezes, it shouldn't halt the system operations

Screenshots N/A

OS version:

fedora:fedora/37/x86_64/silverblue
                  Version: 37.20221122.0 (2022-11-22T00:46:03Z)
               BaseCommit: 92af90adf61a498a244ac64f5efa98354c770f6a83ec19d579c875a4c53dde7c
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
      RemovedBasePackages: firefox firefox-langpacks 106.0.4-1.fc37
          LayeredPackages: containerd.io dnf-plugins-core docker-ce docker-ce-cli docker-compose-plugin firewall-config fzf gnome-tweaks
                           gstreamer1-plugin-openh264 libvirt lm_sensors mozilla-fira-mono-fonts openssl virt-manager zsh
                           zsh-autosuggestions zsh-syntax-highlighting

Additional context Add any other context about the problem here.

francoism90 commented 1 year ago

Writing to other disks that aren't part of the root are going fine. It's the rpm-ostree which is really slow.

My hardware: AMD Ryzen 3600 32GB DDR4 Samsung 980 NVMe

I can run FW without any problems.

travier commented 1 year ago

Not sure what you want us to do here. When the rootfs is under heavy IO usage, everything gets delayed.

francoism90 commented 1 year ago

@travier True, however the impact of writing/unpacking on the rootfs when using Fedora Silverblue causes issues. I don't have the same issues when writing on the rootfs on Fedora WS or using any other distro.

It seems something is blocking and this causes the rootfs (ostree?) to eventually freeze.

If you want, I can provide debugging.

travier commented 1 year ago

ostree/rpm-ostree just uses a regular filesystem and does not do any operation in the background (unless GNOME Software is preparing an update). What you're seeing here might be coming from Btrfs being used by default or something else.

francoism90 commented 1 year ago

@travier Thanks for your reply. :)

Hmm, anyway I could check what's the cause? It's a bit weird not having this issue on other distro's. Now I've think about it.. they were ext4/XFS.. hmm, could Btrfs be the reason? I do have compression enabled (default options provided by FS).

travier commented 1 year ago

I'll close this issue as I don't think it's a bug in Silverblue. Please ask your Btrfs questions on https://ask.fedoraproject.org/ and folks will help you there.

francoism90 commented 1 year ago

@travier I would like to this be re-open. I'm running Btrfs on another drive, which isn't root, and all works perfectly. However when installing/downloading (e.g. Steam) to the root image, I get out-of-memory issues and everything just crashes.

My memory is fine, my NVMe firmware is up-to-date, etc. it must the rpm-ostree or something else in Fedora Siverblue causing this issue.

francoism90 commented 1 year ago

image

My CPU goes up like crazy, which isn't the case when writing the same thing/game to another drive.

travier commented 1 year ago

What makes you think that this is Fedora Silverblue specific and that this would not happen in Fedora Workstation? This might be a kernel or systemd-oom bug.

travier commented 1 year ago

Missing from your graph is the disk I/O pressure.

francoism90 commented 1 year ago

@travier Because I don't have this on Fedora Workstation. The disk I/O is the same as the network, but I'll provide a graph later.

I tried disabling systemd-oomd, but the slowness remains.

travier commented 1 year ago

Is your Fedora Workstation installation using the same filesystem, with the same options? The original version used for the installation matters here.

francoism90 commented 1 year ago

@travier I think I've found a solution, not me, but others did. :)

First I need to apologize, since I did the test wrong when using Fedora Workstation. I was not using the same setup: encrypted vs non-encrypted.

Anyway, the following solutions seem to restore full performance: https://wiki.archlinux.org/title/Dm-crypt/Specialties#Disable_workqueue_for_increased_solid_state_drive_(SSD)_performance https://www.reddit.com/r/linux/comments/zkyzmh/if_your_system_is_installed_on_dmcrypt_and/

Is there any reason why it's the default? It seems I'm not the only one having this issue.

travier commented 1 year ago

Nice find! This is something that would need to be fixed at some level but not sure where exactly. Marking as enhancement.

miabbott commented 1 year ago

This is something that would need to be fixed at some level but not sure where exactly.

Seems like we would want this to be fixed in anaconda during the storage setup/configuration phase.

Alternately, we could ship a systemd service that could apply this change post-install?

travier commented 1 year ago

Likely best fixed in storage setup in Anaconda indeed.

travier commented 1 year ago

I've submitted https://bugzilla.redhat.com/show_bug.cgi?id=2154817

iaacornus commented 1 year ago

zen kernel also has disabled the workqueue by default, also you can disable it manually by

sudo cryptsetup --perf-no_read_workqueue --perf-no_write_workqueue --persistent refresh <name>

where <name> is the luks-* that youll see when you executed lsblk -f, e.g. youll also want to verify first if the partition is encrypted with sudo cryptsetup isLuks /dev/<DEV> && echo SUCCESS. cloudflare also has an interesting findings about it https://blog.cloudflare.com/speeding-up-linux-disk-encryption/

I disabled it manually on my setup, no detrimental side effects observed as of writing.