ValveSoftware / steam-for-linux

Issue tracking for the Steam for Linux beta client
4.21k stars 175 forks source link

horribly high system/io load #6073

Open nonchip opened 5 years ago

nonchip commented 5 years ago

Your system information

Please describe your issue in as much detail as possible:

whenever steam downloads/updates a game and accesses the disk (as seen both in steam's graph and htop's "DISK R/W" column) it hogs it so bad everything freezes up from iowait. loadavg shoots up to above 100.

since it got worse after i got a better hdd (with ssd hybrid cache) I suspect it to be hogging some sort of I/O controller or kernel thread (now that it can actually use that high bandwidth throughput), blocking others (swap would be the most obvious victim, but it's even making my mouse cursor laggy and i'm pretty sure X doesn't get swapped out while i use it, so probably "general I/O" or kernel load)

Steps for reproducing this issue:

  1. download something in steam
  2. watch the io load or try watching a youtube vid, or even moving the mouse around (anything that needs the slightest amount of I/O time)
  3. go crazy because your system is totally unresponsive

in case you need any logs/etc, please do tell

aconscious commented 4 years ago

I'm also experiencing this issue. Your system information

My setup also runs a LUKS encrypted LVM setup on SSDs which I thought might be related, since steam both has to decrypt and/or decompress the files and then lets the kernel encrypt them when writing.

I have been able to slightly mitigate the issue by throttling the download speed to 3MB/s, but the system is still severly crippled by the downloads.

Other things I have tinkered with:

Noteworthy is this only seems to happen when High IO is created by the steam client when downloading/installing games

smirgol commented 3 years ago

I'm recently also having a very similar issue. I don't know exactly when it started, but it cannot be that long ago, a couple of weeks maybe. What happens is, that sometimes, but not always, when I start a game and it is starting "Processing Vulkan shaders", it gets stuck. I/O is at 100% and nothing is going to happen for up to 30 minutes or even longer. CPU load is normal, it's just the disk I/O. I'm not having the issue that my system becomes unresponsive though, it's just the game won't start until steam has finished whatever it is doing.

I have absolutely no idea what's going on and why it takes that long. At some point, which can really take a long time, it finishes, compiles the shaders and everything is good - until next time. I think it also doesn't happen all the time when shaders need to be rebuilt and sometimes it also happens when it's updating a game.

Is there any way I can debug what is going on? I find it very difficult to get a hold on the detailed I/O information. All that I can see in iotop is: steam -nominidumps -nobreakpad [CJobMgr::m_Work]

Steam system information

Edit: I've found a similar bug report and did what was described here I get tons of these lines from the process mentioned above:

 CJobMgr::m_Work-32296 [020] .... 46424.210060: ext4_mark_inode_dirty: dev 8,49 ino 231761254 caller ext4_truncate+0x1ee/0x460
 CJobMgr::m_Work-32296 [020] .... 46424.210061: ext4_mark_inode_dirty: dev 8,49 ino 231761254 caller ext4_evict_inode+0x34f/0x570
 CJobMgr::m_Work-32296 [020] .... 46424.234431: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_dirty_inode+0x64/0x80
 CJobMgr::m_Work-32296 [020] .... 46424.234462: ext4_mark_inode_dirty: dev 8,49 ino 231735703 caller ext4_unlink+0x2c7/0x380
 CJobMgr::m_Work-32296 [020] .... 46424.234464: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_unlink+0x336/0x380
 CJobMgr::m_Work-32296 [020] .... 46424.234470: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_evict_inode+0x2e3/0x570
 CJobMgr::m_Work-32296 [020] .... 46424.234471: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_ext_truncate+0x2e/0xb0
 CJobMgr::m_Work-32296 [020] .... 46424.234476: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_dirty_inode+0x64/0x80
 CJobMgr::m_Work-32296 [020] .... 46424.234477: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller __ext4_ext_dirty.isra.0+0x74/0x90
 CJobMgr::m_Work-32296 [020] .... 46424.234478: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller __ext4_ext_dirty.isra.0+0x74/0x90
 CJobMgr::m_Work-32296 [020] .... 46424.234479: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_truncate+0x1ee/0x460
 CJobMgr::m_Work-32296 [020] .... 46424.234480: ext4_mark_inode_dirty: dev 8,49 ino 231765243 caller ext4_evict_inode+0x34f/0x570
 CJobMgr::m_Work-32296 [020] .... 46424.234774: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller ext4_dirty_inode+0x64/0x80
 CJobMgr::m_Work-32296 [020] .... 46424.234791: ext4_mark_inode_dirty: dev 8,49 ino 231735703 caller ext4_unlink+0x2c7/0x380
 CJobMgr::m_Work-32296 [020] .... 46424.234793: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller ext4_unlink+0x336/0x380
 CJobMgr::m_Work-32296 [020] .... 46424.234797: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller ext4_evict_inode+0x2e3/0x570
 CJobMgr::m_Work-32296 [020] .... 46424.234798: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller ext4_ext_truncate+0x2e/0xb0
 CJobMgr::m_Work-32296 [020] .... 46424.234803: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller ext4_dirty_inode+0x64/0x80
 CJobMgr::m_Work-32296 [020] .... 46424.234804: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller __ext4_ext_dirty.isra.0+0x74/0x90
 CJobMgr::m_Work-32296 [020] .... 46424.234805: ext4_mark_inode_dirty: dev 8,49 ino 231752887 caller __ext4_ext_dirty.isra.0+0x74/0x90

Not sure what to do with that information though. Another thing worth to mention is, that the amount of data that is written is actually very low, below 1 M/s. Yet it is blocking. I've patched my kernel to work with fsync, might that be the issue?

pfwb commented 3 years ago

Same issue here. Although I don't lose responsiveness (since my steam library resides in another HDD), I'm also seeing that high I/O when launching a game.

h1z1 commented 3 years ago

CJobMgr::m_Work-32296 [020] .... 46424.210060: ext4_mark_inode_dirty: dev 8,49 ino 231761254 caller ext4_truncate+0x1ee/0x460

That would be a kernel function. Is system tracing enabled ?

grep . /sys/kernel/debug/tracing/{tracing_on,events/ext4/ext4_mark_inode_dirty/enable}

Check syslog or /proc/kmsg

One of the things Steam tries to do is enable it. The client used to complain if it couldn't. Problem isn't necessarily with tracing itself but the level of tracing Valve enables is .. silly. It will slam the hell out of syslog and was a basis for them requiring root since you can't enable it as a user (nor should it ever outside a DEVEL environment).

tl;dr - Steam writing it out to a log can get amplified very quickly. The write itself can cause another event which snowballs into a loop. It gets worse when you add on anything else logging klog like syslogd or worse journald.

DanielGaaA commented 1 year ago

Lately this was getting really bad. Freezes while finalizing download could cause Firefox to crash(before it was just desktop freeze for few seconds). I have slow QLC nvme drive that I though was TLC when buying.... Thanks Crucial for sending reviewers TLC drives while selling it as QLC drive. Don't buy Crucial P2

Anyway all my problems went away when I switched to "Kyber" I/O scheduler.

nonchip commented 1 year ago

yup we're there again, especially right after starting up* the client it catapults my load (on a ryzen5 with ssd!) up to >10 consistently, by deciding to upgrade a bunch of wineprefixes for no reason while apparently unpacking a throttled download with more than a thread per core. *(which of course happens after boot usually, when things are still calming down anyway, and sometimes gets so slow it prevents my desktop from actually loading up)

my problem definitely wasn't kernel tracing or extfs related, given i didn't use extfs since opening this, and steam never runs as root, who gave you that idea @h1z1 :P

also thanks for derailing the issue @smirgol @pablow1422 @h1z1 but can you fix your shader caching in your shader caching ticket maybe?