Closed aphyr closed 5 years ago
The reads no longer time out. But, the test never finishes. It keeps on spitting:
2019-02-01 23:22:16,808{GMT} INFO [jepsen worker 1] jepsen.dgraph.set: Forcing conflict by deleting ...
Looks like these conflicts don't happen, and the workers just keep looping over them forever. Might be a logic based on an old behavior of Dgraph which no longer happens.
Filed an issue: https://github.com/jepsen-io/jepsen/issues/307
Haven't seen this happen for a while. Closing.
On the build @manishrjain submitted for testing on Friday, May 18, 2018, single-record set tests can stall indefinitely during reading. Although the cluster is stable, all nodes are running, and the network is totally connected, and all test-initiated predicate migrations appear to have completed, every read request will time out. This condition can last for at least an hour.
For instance, see 20180522T125649.000-0500.zip, where in the middle of the final reads, transactions just... start timing out.
Although each process goes on to retry the read process, no subsequent query ever returns:
About ten seconds after operations start timing out, Zero on n1 logs:
And n4 logs corresponding predicate moves:
No other node logs anything after 11:04. Is it possible that an automatic predicate migration started, then got stuck somehow? The 20-minute intervals between migrations suggest that it's retrying the migration, at least, but timing out every request looks... odd.
You can reproduce this with Jepsen b0b458d32e43c072f257b75ea786431ea0d0c7a5 by running: