btrfs / btrfs-todo

An issues only repo to organize our TODO items
21 stars 2 forks source link

Perf regression from 5.0 #2

Closed josefbacik closed 3 years ago

josefbacik commented 4 years ago

Nikolay has dug into a perf regression that happened in 5.0. He bisected it down to ripping out the end transaction throttling that would commit the transaction, because that was moved into the ticketing stuff. So we went from committing the transaction thousands of times a second to not as often, but our write speed went down.

The conclusion we've come to is that we were basically short-circuiting the ENOSPC flushing stuff by committing all the time, so we weren't getting stuck waiting for ENOSPC flushing directly. Now we are, so while the ENOSPC flushing is doing a lot less work, we're waiting on it more often. We need to bring back the pre-flushing stuff, as my changes essentially got rid of pre-flushing, as we just stop flushing if there are no tickets.

lorddoskias commented 4 years ago

The command is:

./fio --direct=0 --ioengine=sync --thread --directory=/abuild --invalidate=1 --group_reporting=1 --runtime=300 --fallocate=none --ramp_time=10 --name=RandomWrites-async-64512-4k-4 --new_group --rw=randwrite --size=16128m --numjobs=4 --bs=4k --fsync_on_close=0 --end_fsync=0 --filename_format=FioWorkloads.\$jobnum

The important bit here is that the total workload size should be at least 2x ram size so in this case we have 16g * 4 jobs = 64g for a machine wtih a ram size of 64g. In my tests I managed to get the --size=2G --numjobs=4 for a machine with 8g ram .

lorddoskias commented 3 years ago

Just re-run the test case for 4.19/5.0 and latest misc-next which includes your series and ther esults look as follows:

misc-next (with josef's fix)
  WRITE: bw=26.9MiB/s (28.2MB/s), 26.9MiB/s-26.9MiB/s (28.2MB/s-28.2MB/s), io=8192MiB (8590MB), run=304515-304515msec
v5.10:
  WRITE: bw=20.1MiB/s (21.1MB/s), 20.1MiB/s-20.1MiB/s (21.1MB/s-21.1MB/s), io=8192MiB (8590MB), run=407373-407373msec
v4.19: 
  WRITE: bw=24.7MiB/s (25.9MB/s), 24.7MiB/s-24.7MiB/s (25.9MB/s-25.9MB/s), io=6152MiB (6451MB), run=249205-249205msec

So I'd say that particular performance problem has been fixed. So it will likely land in v5.12