ci: run jepsen tests with --max-offset=experimental-clockless

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

30.07k stars 3.8k forks source link

ci: run jepsen tests with --max-offset=experimental-clockless #16935

Closed tbg closed 6 years ago

tbg commented 7 years ago

See #16867; we need to exercise that code.

bdarnell commented 6 years ago

I'm doing a manual run now. The tests so far have passed, but they're slow that they're not actually doing much. The first image is for a regular jepsen run, and the second is for clockless mode:

rate

rate-clockless

Read throughput is reduced by about half, but write throughput drops to near zero. (there would be more info in the logs, but we don't preserve those from "successful" runs, so we'll need to repeat the run)

bdarnell commented 6 years ago

All of the tests in the manual run passed, but many of them performed so few operations that I don't think the test can be considered useful. We need to improve performance of clockless mode before it can be useful (My guess is this means fixing some pathological retry behaviors, but I haven't looked closely at it)

tbg commented 6 years ago

Likely similar issues will appear with any simple load generator (kv?).

bdarnell commented 6 years ago

Not necessarily - jepsen has high contention, unlike most of our other load generators. But we should definitely start by looking at kv before jepsen.

jordanlewis commented 6 years ago

Would kv with a small cycle-count be a similar workload?

bdarnell commented 6 years ago

Maybe - try it and see.

tbg commented 6 years ago

I've been trying to reproduce this with kv today and had no success, but I think it's because I didn't manage to provoke ReadWithinUncertaintyIntervalErrors in sufficient quantities (locally). Will give it another try tomorrow.

bdarnell commented 6 years ago

You could try the bank workload too - I think that is more likely to get into deadlock situations than kv with a short cycle-count (and this slowdown has been seen in the jepsen version of the bank workload)

tbg commented 6 years ago

We have decided not to pursue this for 2.0.

petermattis commented 6 years ago

We've decided not to pursue the experimental-clockless mode.