anishathalye / porcupine

A fast linearizability checker written in Go 🔎
https://anishathalye.com/testing-distributed-systems-for-linearizability/
MIT License
926 stars 52 forks source link

RWLock'd map deemed not linearizable #11

Closed thomasjungblut closed 1 year ago

thomasjungblut commented 1 year ago

Hey @anishathalye,

Thanks for the library, we're making pretty heavy use of this now in etcd and I spent the day to play around with it and check a toy database of mine.

While doing that, I came across an odd issue with sync.RWMutex which guards the three main operations (Get with ReadLock and Put+Delete with a WriteLock).

From sync.RWMutex:

A RWMutex is a reader/writer mutual exclusion lock. The lock can be held by an arbitrary number of readers or a single writer. The zero value for a RWMutex is an unlocked mutex. If a goroutine holds a RWMutex for reading and another goroutine might call Lock, no goroutine should expect to be able to acquire a read lock until the initial read lock is released. In particular, this prohibits recursive read locking. This is to ensure that the lock eventually becomes available; a blocked Lock call excludes new readers from acquiring the lock.

Given this is implemented correctly in Go (I would assume so), this should give us linearizability over concurrent map operations. A single threaded happy path test runs correctly, the multi-goroutine one sadly returns Illegal.

The full code can be found in this commit: https://github.com/thomasjungblut/porcupine/commit/d01d21699b74166d39f001b781767aa812fbf366

And should be reproducible with a simple go test on the rwlock package.

I would love to attach a better visualization of this issue, however there's some issue with the JS in this particular case: image

The more complex case from my toy database looks like this:

image

I'm rather curious as to why the "Get" was allowed to run during the "Put" and even the "Put" can overlap with another. BTW I also suspect a clock resolution / backward movement issue with time.Now here...

EDIT: this also happens with a simple sync.Mutex.

Let me know what you think, especially if I screwed up the model somehow.

anishathalye commented 1 year ago

The issue is with the model. (re-attaching your original model.go here in case anyone wants to refer to it in the future).

(hidden, click to expand) ```go package rwlock import ( "fmt" "github.com/anishathalye/porcupine" "reflect" ) const ( GetOp = iota PutOp = iota DelOp = iota ) type Input struct { Operation uint8 Key int Val int } type Output struct { Key int Val int Found bool } type State struct { state map[int]int } var Model = porcupine.Model{ Init: func() interface{} { return State{ state: map[int]int{}, } }, Step: func(state interface{}, input interface{}, output interface{}) (bool, interface{}) { s := state.(State) i := input.(Input) o := output.(Output) stateVal, found := s.state[i.Key] switch i.Operation { case GetOp: if !o.Found { return !found, s } else if stateVal == o.Val { return true, s } break case PutOp: s.state[i.Key] = i.Val return true, s case DelOp: delete(s.state, i.Key) return true, s } return false, s }, Equal: func(a, b interface{}) bool { return reflect.DeepEqual(a, b) }, DescribeOperation: func(input interface{}, output interface{}) string { i := input.(Input) o := output.(Output) opName := "" switch i.Operation { case GetOp: opName = "Get" break case PutOp: opName = "Put" break case DelOp: opName = "Del" break } return fmt.Sprintf("%s(%d) -> %d", opName, i.Key, o.Val) }, DescribeState: func(state interface{}) string { s := state.(State) return fmt.Sprintf("%v", s.state) }, } ```

The problem is that Model.Step violates the contract that Step should be a pure function. (This isn't explained clearly in the docs, so I'll work on improving the documentation on this.)

Fixing that:

diff --git i/rwlock/model.go w/rwlock/model.go
index 29ca178..6eba91c 100644
--- i/rwlock/model.go
+++ w/rwlock/model.go
@@ -28,6 +28,15 @@ type State struct {
    state map[int]int
 }

+func (st State) Clone() State {
+   var st2 State
+   st2.state = make(map[int]int)
+   for k, v := range st.state {
+       st2.state[k] = v
+   }
+   return st2
+}
+
 var Model = porcupine.Model{
    Init: func() interface{} {
        return State{
@@ -35,7 +44,7 @@ var Model = porcupine.Model{
        }
    },
    Step: func(state interface{}, input interface{}, output interface{}) (bool, interface{}) {
-       s := state.(State)
+       s := state.(State).Clone()
        i := input.(Input)
        o := output.(Output)

Now, it seems to pass:

$ go test -v
=== RUN   TestHappyPath
--- PASS: TestHappyPath (0.00s)
=== RUN   TestMultiGoroutines
--- PASS: TestMultiGoroutines (3.34s)
PASS
ok      github.com/anishathalye/porcupine/rwlock        3.470s

I'm guessing that the borked visualization was due to the buggy Model.Step, but if you're still seeing buggy visualizations, please let me know, I would like to debug it.

anishathalye commented 1 year ago

Also, btw, you'll get much better perf by partitioning the linearizability check by key, and then having the (per-key) model be just a register (i.e., a single value), like this: https://github.com/anishathalye/porcupine/blob/7cbc8342681863b08e96432402d0588fc1defa88/porcupine_test.go#L1141

anishathalye commented 1 year ago

I updated the documentation for Model; happy to hear any improvements you can suggest to the wording.

thomasjungblut commented 1 year ago

Thanks @anishathalye, that did the whole trick.

it cannot mutate the given state.

Thanks for updating the documentation, I already discovered the partitioning trick as well!

Would you be interested if I move porcupine to use generics? I reckon with a proper interface that can be "Clonable" the library can take care of creating a new state every time before passing to the step function? Also would remove the clutter of casting the types every time.

tjungblu commented 1 year ago

I've done a first attempt at moving to generics and adding the state cloning in #12, let me know what you think.

anishathalye commented 1 year ago

Great, glad your issue was resolved! #12 looks interesting, will share my thoughts in the comments for that PR.