If we run a benchmark with hight throughput (in our case TPC-B with 24 threads using Zapps), most evicted pages are written out at an older state, such that single-page recovery is always required when the page is fetched again. See this output from a benchmark run:
This goes on an on once the buffer is full and eviction becomes frequent.
The reason for this is probably the cleaner behavior, which marks a flushed page as cleaned without checking if it changed since a copy was taken into the write buffer. This can be checked easily using the page LSN field.
This was an old bug which was already known to me, but now, for the first time, I see it in practice. Just adding an LSN comparison is probably not enough, since the cleaner has suspiciously few concurrency control. For instance, all accesses to the array of frames are completely free of synchronization. This should be looked into in more detail and we probably want to redesign the cleaner completely.
If we run a benchmark with hight throughput (in our case TPC-B with 24 threads using Zapps), most evicted pages are written out at an older state, such that single-page recovery is always required when the page is fetched again. See this output from a benchmark run:
This goes on an on once the buffer is full and eviction becomes frequent.
The reason for this is probably the cleaner behavior, which marks a flushed page as cleaned without checking if it changed since a copy was taken into the write buffer. This can be checked easily using the page LSN field.
This was an old bug which was already known to me, but now, for the first time, I see it in practice. Just adding an LSN comparison is probably not enough, since the cleaner has suspiciously few concurrency control. For instance, all accesses to the array of frames are completely free of synchronization. This should be looked into in more detail and we probably want to redesign the cleaner completely.