Open steveyen opened 7 years ago
Capturing more thoughts from email channels...
Related, Sarath mentions that he used O_SYNC (synchronous I/O) writes to address similar problem...
... we faced similar fsync problem and finally ended up using O_SYNC/synchronous I/O. The primary problem is that OS would pile up large amount of to-be-written data in the buffer cache (many GBs based on free memory available) and when fsync() is called, it would try to flush many GBs to the SSD. SSD has a limited bandwidth and it takes longer to drain the OS buffer. Calling fsync at smaller intervals or using synchronous I/O helps to uniformly write to the SSD with a maximum utilization of the SSD bandwidth. Otherwise, we may observe SSD being idle for a duration and sudden burst of writes due to large buffer flush resulting in higher write latency as well.
And, @mschoch mentions a warning... beware of throughput tradeoffs in order to buy the improved latencies and vice-versa.
More interesting links / info about this from Dave Rigby...
From my experiments the fundamental issue we were hitting is that Linux doesn’t currently have any QoS at the block queue layer for reads vs. writes from the same process - note this is well “below” the Filesystem / block cache layer.
Therefore, what can happen is that a large number of writes are issued by the fscache / filesystem, and subsequent reads are stuck behind them.
Note there has been some changes in the Linux kernel to address this problem directly - see LWN: Toward less-annoying background write back , which appeared in Linux 4.10 (https://kernelnewbies.org/Linux_4.10#head-f6ecae920c0660b7f4bcee913f2c71a859dcc184), however afaik none of the main distros are using 4.10 yet. As such the periodic fsync() is a somewhat “ghetto” solution in userspace
Created MB-25977 to follow up on this..
Occasionally fsync()'ing during compaction (e.g., after ever 16mb written) might have performance impact -- see if an optional parameter for this might help moss?