anishathalye / porcupine

A fast linearizability checker written in Go 🔎
https://anishathalye.com/testing-distributed-systems-for-linearizability/
MIT License
958 stars 53 forks source link

Should return sequence that violates linearizability #4

Closed toddlipcon closed 4 years ago

toddlipcon commented 6 years ago

When Knossos fails, it gives output showing the sequence of events that violated linearizability. Porcupine seems to just return "false" from the checker. It would be great if it would show the underlying linearizability violation in a manner similar to Knossos.

anishathalye commented 6 years ago

It would be great to return more output to the user that helps in debugging.

What does it mean to show the sequence of events that violate linearizability?

Linearizability is a property of a history. Linearizability is in NP, so if a history is linearizable, there's a short witness that can be checked efficiently (the linearization). However, linearizability is not known to be in co-NP -- so we don't know of a short witness that can be used to verify that there exists no linearization.

Given a non-linearizable history, there are usually many linearizable prefixes of the history such that any extension of that prefix demonstrates an inconsistency. When a history is not linearizable, Knossos returns some linearizable prefix with this property. It's unclear if doing this is helpful for debugging a system, because it doesn't necessarily correspond with "what happened"; it's simply a linearizable prefix of the history. We could even return the longest possible linearizable prefix, but similarly, it's not necessarily "what's happened", and it's not clear if it'll be helpful to know.

anishathalye commented 6 years ago

We're currently using Porcupine in testing for linearizability in 6.824 lab assignments, and we briefly considered showing the longest possible linearizable prefix, thinking it may be useful as a debugging tool. After some thought, we came to the conclusion that this may not be too helpful, and may even cause harm, if the linearizable prefix is interpreted as "truth" and it's assumed that something went wrong after the operations in the linearizable prefix (because this isn't necessarily true).

I could be convinced to add this feature, but this is why I haven't done it as of now.

I think it would be great to think more about what might be useful information from a linearizability checker in tracking down a bug in an implementation, and then implement new functionality based on that.

andrewjstone commented 6 years ago

I've thought a good amount about this, and discussed it with Kyle a while back. I think it's useful to provide the first failing history as knossos does, but also to show all concurrent operations along with the failing op. That way you show that the failing history is not necessarily the actual history, but still give enough information to figure out what possible permutations may have caused the actual failure.

On Tue, May 15, 2018 at 10:09 PM, Anish Athalye notifications@github.com wrote:

We're currently using Porcupine in testing for linearizability in 6.824 https://pdos.csail.mit.edu/6.824/index.html lab assignments, and we briefly considered showing the longest possible linearizable prefix, thinking it may be useful as a debugging tool. After some thought, we came to the conclusion that this may not be too helpful, and may even cause harm, if the linearizable prefix is interpreted as "truth" and it's assumed that something went wrong after the operations in the linearizable prefix (because this isn't necessarily true).

I could be convinced to add this feature, but this is why I haven't done it as of now.

I think it would be great to think more about what might be useful information from a linearizability checker in tracking down a bug in an implementation, and then implement new functionality based on that.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/anishathalye/porcupine/issues/4#issuecomment-389372091, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcf-SnaCQquSGRFbE2DXjjJikoo6i8Qks5ty4pygaJpZM4UAe62 .

toddlipcon commented 6 years ago

Right, the "failing history" isn't the root cause, but it at least helps you understand where you went wrong. Particularly in the case that your model may have some issue (with more complex models).

I resorted to manually binary-searching for the longest linearizable prefix for my own use case and found the results useful to see where the first unexpected result was in the history. In my first test case I dont even have any concurrent operations so it's quite clearly a "wrong step" (perhaps due to a bug in my model)

fenollp commented 6 years ago

The counter example shown in the first few figures at https://www.anishathalye.com/2017/06/04/testing-distributed-systems-for-linearizability/ by the author of both the post and this project seems picked in a thoughtful way. Maybe the same can be applied in general? Seems like the technique is to find the minimal history that makes this tool return false. Once that’s in, it’d be nice to have an enum of the reason why the history is not linearizable (stale read, ...).

anishathalye commented 4 years ago

I wrote some code that returns failing history prefixes, and I also wrote some visualization code. It's in the visualization branch. Still a WIP; feedback appreciated.

Linearizable prefixes

As mentioned in this comment above, it's not known if it's possible to produce a short proof witness for non-linearizability (NP ?= co-NP). Still, as several people pointed out above, it can still be useful to return some information to aid in debugging.

I updated the code to keep track of partial linearizations, saving them at backtracking points. With this, we get, for each history element, the longest linearization that we found that includes the given history element. This doesn't necessarily tell us "what actually happened", but it can often help pinpoint the underlying issue.

Visualization

With the set of linearizable prefixes available, it's possible to visualize the entire history along with the linearizable prefixes found. I wrote some code that produces a HTML page with diagrams that look like this:

example

Interactive version available here: https://s.anish.io/porcupine-vis-example.html

The diagram shows the history of operations, with descriptions of the operations (arguments/return values), and it shows the longest-length linearizable prefix in each partition. Other linearizable prefixes are shown in a translucent color. The x-scale is not proportional to time: instead, time in the diagram is relative (it preserves ordering but not scale). I initially had the x-positions correspond to real time, but the diagrams became unreadable (some operations are really fast, and others are really slow, which results in huge discrepancies in sizes).

A linearizable prefix is shown as a sequence of black bars, with lines connecting them. After the last element that's included in the linearizable prefix, red bars show possible (but illegal) next operations (which would correspond to operations that the search process tried to include in the linearization but which could not have executed at that point).

Hovering over a history element hides elements in other partitions, and it shows the longest-length linearizable prefix that includes that history element (if any). A tooltip shows the previous/next state of the state machine, along with the actual call/return times. Clicking on a history element makes this sticky, making it easier to scroll left/right while keeping a given partition active. Clicking on the same element or the background clears the selection; clicking on another element changes the election.

Larger histories

The example above is synthetic. Here's what the output looks like for a linearizable history from a key-value server:

Screen Shot 2019-12-30 at 3 32 10 PM

(interactive version)

And here's what it looks like when the key-value server is deliberately broken (in this case, by allowing stale reads):

Screen Shot 2019-12-30 at 3 28 25 PM

(interactive version)

I think it's useful with these realistic histories. E.g. in the failing example, I think it's possible for a developer to guess that the issue is a stale read, and then go about debugging that.

Example

To see the API for the new functionality, see visualization_test.go.

Next steps

I hope to merge this code eventually, but it's not quite ready yet. Here are some things that remain to be done:

anishathalye commented 4 years ago

CC @siddontang in case you have any insights on this

anishathalye commented 4 years ago

I made a bunch of small improvements and bugfixes to the visualization code, and I got rid of all external dependencies (including the LP solver). I think it's good enough to use now (merged in a6d4108f7bb39858cae1028c31846bb68596c6ee); we're going to give this as a tool to 6.824 students in the hopes that it might help with debugging (and demystify the "history is not linearizable" result from the tests, with previously had no extra context).