anishathalye / porcupine

A fast linearizability checker written in Go 🔎
https://anishathalye.com/testing-distributed-systems-for-linearizability/
MIT License
926 stars 52 forks source link

Allow adding operations that are not part of any linearization #17

Closed serathius closed 6 months ago

serathius commented 6 months ago

Fix https://github.com/anishathalye/porcupine/issues/14

Don't have enough experience with library to properly design a interface, I'm open to suggestions.

Added only method for adding Operation and not Events. Reason is that events don't have notion of time, they infer time from the order, however for this use case we want to fill the gaps into linearization visualization, without changing it directly, so we need to use time as reference point.

Selection_014

serathius commented 6 months ago

cc @anishathalye

serathius commented 6 months ago

Created a draft integration into etcd robustness test, so we can see how it looks:

image

Etcd robustness test patch operation history before passing it to porcupine.

This is done to improve reliability of the test, by reducing chance of linearization hitting exponential complexity. We remove operations that don't impact linearization like failed read requests and failed writes that we know they were not persisted.

Adding those operations back in visualization would allow us to fill the gaps and make the picture complete. This makes the visualization more understandable.

serathius commented 6 months ago

ping @anishathalye

anishathalye commented 6 months ago

I will take a look this weekend. Quick feedback now, I would like the API to support adding "extra information" more broadly, not just operations (that go the code that converts operations to strings).

serathius commented 6 months ago

Sounds good, I was thinking about adding option to add a groups of operations with custom colors with legend and ability to hide a group.

In etcd robustness tests we have at least 3 types of operations. We have normal linearizable requests, failed linearizable reads, that we don't want to put because they just reduce performance without impacting linearization and serializable operations, requests about historical state that we check if it's available in history window, but don't validate the response contents. Would be nice to have each of them in different color to easily distinguish.

For failed reads it would be also nice to hide it, when testing the PR I found that there are long periods of all requests failing. This is because in etcd robustness tests we inject failpoint into etcd that can take the process down. During this time the API is not responsive, and we only send read requests (again performance to avoid linearizing too many concurrent failing writes) until API recovers. Would be nice to just have an option to hide failed requests.

Maybe one additional feature on my mind is show non-operational information. For etcd robustness we run the in 3 phases:

  1. setup to establish enough QPS
  2. inject failpoint during which we break etcd in some way
  3. recovery after failpoint injection

The most interesting part is usually the failpoint injection and the beginning of recovery. In 3 node cluster, failpoint injection doesn't necessarily goes down, so it's interesting to see if node going down causes invalid behavior to occur. And when the node recovers it's interesting to see whether it correctly rejoins the cluster. Would be nice to have those phases highlighted in the visualization.

anishathalye commented 6 months ago

I think https://github.com/anishathalye/porcupine/pull/18 addresses all of those needs, except the ability to hide annotations in the UI. Would appreciate any feedback you have on that PR!

serathius commented 6 months ago

Closing for https://github.com/anishathalye/porcupine/pull/18