Open nurturenature opened 2 years ago
P.S. a good way to get a representative feel for what happens during inter dc partitioning:
# run test multiple times regardless of valid? true/false
lein run test-all --topology dcs --workload g-set --nemesis partition --test-count 10
Most will be invalid. Take a quick look at the test summary pages, latency-raw.png
to see partition timing/duration and any failed transactions (red/orange), results.edn
for total :ok adds missing from final reads, and the general feel in jepsen.log
.
Test failure does seem to group into several patterns:
Partitioning a cluster of data centers running AntidoteDB can cause :ok g-set adds to not be fully replicated, or in some cases appear on other nodes only to not be present in the final read.
Details of the Jepsen test: https://github.com/nurturenature/fuzz_dist/blob/main/doc/antidotedb.md
Jepsen environment configured for AntidoteDB: https://github.com/nurturenature/jepsen-docker-workaround
Test commands:
The best way to initially interact with the test results is through the web server as described in jepsen-docker-workaround.
Here's a sample workflow tracing an anomaly:
results.edn
history.txt
, scroll to bottom, add see that 136 is only present on original nodeNow lets look at an AntidoteDB log file for a node:
The
timeline.html
can also be used:But missing from final read by worker 3:
Please ask if there's any questions, desired changes to the test, environment, etc.