canonical / dqlite

Embeddable, replicated and fault-tolerant SQL engine.
https://dqlite.io
Other
3.85k stars 216 forks source link

Jepsen: assertion failure in vfs.c #541

Closed cole-miller closed 10 months ago

cole-miller commented 10 months ago

This Jepsen job:

https://github.com/canonical/jepsen.dqlite/actions/runs/6950544166/job/18910920901

Tripped this assertion:

https://github.com/canonical/dqlite/blob/09108b890e26965558f9117aa554513a4956be3b/src/vfs.c#L2343

freeekanayaka commented 10 months ago

This should be due to the fact that if the process is paused the kernel will still accumulate incoming data in the tcp buffers of open connections, when the process resumes the uv loop might consume that data immediately, without given the monitor_cb function in server.c the chance to run as soon as leadership is lost (because monitor_cb is uv_prepare_t handle callback).

Trying to fire monitor_cb as soon as leadership is lost should fix it, similarly to what was done here https://github.com/cowsql/cowsql/pull/12.