Open petermattis opened 4 years ago
Here is a 10m run with pacing enabled:
____optype__elapsed_____ops(total)___ops/sec(cum)__keys/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
update_100 600.0s 295399650 492331.3 492331.4 0.5 0.2 1.6 2.6 33.6
And a 10m run with pacing disabled:
____optype__elapsed_____ops(total)___ops/sec(cum)__keys/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
update_100 600.0s 243737974 406229.5 406229.9 0.6 0.3 1.8 6.3 104.9
Ah, pacing causes problems with a workload composed of reads and writes. Here is ycsb A
which is 50% reads and 50% updates. Old is with pacing disabled, new is with pacing enabled:
name old ops/sec new ops/sec delta
ycsb/A 847k ± 1% 633k ± 1% -25.27% (p=0.000 n=10+9)
I have a strong suspicion the problem is that pacing increases read amplification. I added a bit of instrumentation to Iterator
so that it can report read amplification. With pacing disabled the average read amplification across iterators was 5.8. With pacing enabled the average read amplification was 8.7. This would also explain why pacing speeds up write-only workloads: we're allowing the read amplification to increase which is generally good for write performance.
With pacing disabled the average read amplification across iterators was 5.8. With pacing enabled the average read amplification was 8.7. This would also explain why pacing speeds up write-only workloads: we’re allowing the read amplification to increase which is generally good for write performance.
This makes me more convinced that any pacing that does not take into account whether we have an actual CPU or IO resource bottleneck is probably making unnecessary tradeoffs (related to #1329). In this example:
Is this still worthwhile as an issue separate from #1329?
@jbowens I would say yes - it makes sense as a sub-issue of that. This one is more general about testing the implications of enabling pacing on its own, while that one is about potentially using pacing in IO-constrained situations (assuming it is beneficial in prioritizing user-writes or other compactions). I can see #1329 wrapping up without much in the way of using pacing, in which case this could stand as a remaining bit of exploration/work.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to Pebble!
Flush/compaction pacing was disabled in https://github.com/cockroachdb/pebble/pull/431 because it had an apparent effect on throughput. Some recent experiments with
pebble bench ycsb
show a throughput benefit to pacing (old is with pacing disabled, new is with pacing enabled).Interestingly, compaction pacing was very quickly disabled in these tests because compactions were not keeping up. The above tests were for only 1m in duration. Longer runs need to be performed. Also need to look at scenarios involving reads and writes.
A lot has changed since mid-December when #431 was merged, so perhaps the true culprit was not pacing. Or perhaps we've fixed other bottlenecks so that pacing is no longer a problem.
Jira issue: PEBBLE-175