caetanosauer / zero

Fork of the Shore-MT storage manager used by the research project Instant Recovery
Other
29 stars 10 forks source link

Buffer eviction writes stale pages / cleaner syncronization #10

Closed caetanosauer closed 9 years ago

caetanosauer commented 9 years ago

If we run a benchmark with hight throughput (in our case TPC-B with 24 threads using Zapps), most evicted pages are written out at an older state, such that single-page recovery is always required when the page is fetched again. See this output from a benchmark run:

[7ffc157fa700] bf_tree.cpp (2443) Stale Child LSN found! Invoking Single-Page-Recovery.. parent=p(1.404), child pid=98, EMLSN=1.31581560 LSN=1.31315344
[7ffc157fa700] bf_tree.cpp (2443) Stale Child LSN found! Invoking Single-Page-Recovery.. parent=p(1.404), child pid=232, EMLSN=1.31660248 LSN=1.31403424
[7ffc157fa700] bf_tree.cpp (2443) Stale Child LSN found! Invoking Single-Page-Recovery.. parent=p(1.405), child pid=266, EMLSN=1.31675416 LSN=1.31325952
[7ffc157fa700] bf_tree.cpp (2443) Stale Child LSN found! Invoking Single-Page-Recovery.. parent=p(1.404), child pid=246, EMLSN=1.31794056 LSN=1.3132840
...

This goes on an on once the buffer is full and eviction becomes frequent.

The reason for this is probably the cleaner behavior, which marks a flushed page as cleaned without checking if it changed since a copy was taken into the write buffer. This can be checked easily using the page LSN field.

This was an old bug which was already known to me, but now, for the first time, I see it in practice. Just adding an LSN comparison is probably not enough, since the cleaner has suspiciously few concurrency control. For instance, all accesses to the array of frames are completely free of synchronization. This should be looked into in more detail and we probably want to redesign the cleaner completely.

caetanosauer commented 9 years ago

Fixed on fa3f2e6