canonical / dqlite

Embeddable, replicated and fault-tolerant SQL engine.
https://dqlite.io
Other
3.89k stars 219 forks source link

Start instrumenting existing code with state machines #678

Closed cole-miller closed 1 month ago

cole-miller commented 3 months ago

This PR introduces several state machines to track and instrument the behavior of existing dqlite code. Specifically, we instrument:

There are lots more things we could instrument but I think this is a good starting point: it gives us enough visibility to follow a write transaction over its whole lifecycle and down to the disk I/O level, on both the follower and the leader. (A big missing piece is linking the histories across nodes; that's nontrivial because our raft messages don't include any kind of ID or room for extensibility, although we could fake something in the I/O fixture.)

The tracking of in-memory log entries is the trickiest part of this PR. I was initially uncertain whether to attach SMs to individual log entries at all, but I found this served as a convenient "hub" to connect other state machines together (e.g. bridging the raft_apply and UvAppend state machines), plus it gives a foothold for tracking how long it takes to apply each entry.

I also added state machines for the append and truncate requests in the raft I/O fixture (raft/fixture.c). This was necessary because the code in raft/replication.c now assumes that the I/O backend makes an SM available to call sm_relate with. The SMs in the fixture are not copies of the ones in the uv I/O backend, but simpler ones.

Finally, I made a few fixes and additions to the sm code.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 81.85185% with 49 lines in your changes missing coverage. Please review.

Project coverage is 81.07%. Comparing base (4e328c9) to head (85e494d). Report is 31 commits behind head on master.

Files with missing lines Patch % Lines
src/raft/replication.c 65.06% 22 Missing and 7 partials :warning:
src/lib/sm.c 57.14% 0 Missing and 6 partials :warning:
src/raft/client.c 76.19% 4 Missing and 1 partial :warning:
src/raft/log.c 92.00% 1 Missing and 3 partials :warning:
src/raft/fixture.c 93.18% 0 Missing and 3 partials :warning:
src/raft/uv_truncate.c 95.23% 1 Missing :warning:
test/raft/lib/heap.c 0.00% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #678 +/- ## ========================================== + Coverage 74.21% 81.07% +6.86% ========================================== Files 195 197 +2 Lines 27738 29164 +1426 Branches 2794 4066 +1272 ========================================== + Hits 20585 23644 +3059 + Misses 4827 3875 -952 + Partials 2326 1645 -681 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cole-miller commented 3 months ago

Pictures!

Leader's view of executing a write transaction in a one-node cluster (./integration-test cluster/restart):

leader-exec

From top to bottom, we have the leader__exec request, the raft_apply request, the new log entry, and the UvAppend request. You can see how long it took to executing the transaction initially, how long it took to write it to disk, and how long it took to apply it once committed.

Follower's view of handling an AppendEntries (./integration-test cluster/dataOnNewNode):

append-follower

At the top is the main appendFollower request. Below that are 10 log entries, and at the bottom is the disk write for the entries.

Finally, a different view of AppendEntries handling on the follower's side. This one uses the fixture (./raft-core-integration-test replication/recvRollbackConfigurationToInitial), so the I/O related state machines are simpler.

follower-truncate

Here, the follower truncated an entry from its log because of the AppendEntries. You see the appendFollower request at the top, then the truncated entry, then the disk truncate, then the two entries that were included in the AppendEntries, then the disk append. You can see that the truncate and append operations are concurrent---replicationApply doesn't wait for the former to finish before kicking off the latter (something I hadn't noticed before I put together this PR). If we used the real I/O backend and instrumented a bit more we'd see that UvTruncate sets a UvBarrier which blocks the UvAppend from running until the truncate has finished.