Closed skmuttlu closed 8 years ago
This isn't a fio issue (it's a kernel bug), nor is it fio asking for a memory allocation that fails. There's nothing that fio can or should do to prevent this, it's something that should be fixed on the kernel side.
Hi,
Let me start with little bit background, however just a brief. Intermittently, however more frequently some of the random write test cases with block size split were failing due to fio hung issue. I have been observing this since last few days, but couldn’t repro it frequently. FIO hung issue was bit annoying to me where test would fail due to fio hung state.
why fio was hung (in deadlock state)? FIO would look for contiguous memory allocation, however I don’t see related sys log messages to say that what size of pages have been requested. This could be the potential reason being extreme memory fragmentation and/or XFS memory pressure issue led FIO into deadlock state. Some specific test cases (FIO workloads) induces extremely fragmented filesystem by involving tailored FIO workloads, and read/write on the same would induce high memory pressure as well. One of these reason might have cause memory extreme fragmentation and failed SLAB memory allocation.
Link to xfs issue: https://bugzilla.kernel.org/show_bug.cgi?id=73831
My tests would run the test with defined runtime (runtime=1200) and kill if fio job(s) exceeds 1200s runtime. I have been observing this issue very often since couple of months on xfs filesystem, where FIO was not honoring defined runtime b’css of hung/deadlock state ( due SLAB memory allocation deadlock), and/or sometime killing (pkill -9) dishonored FIO procs in this state resulting defunct process (/corrupted state).
Here is the substantial proof for the same, where memory allocation was hung...