canonical / dqlite

Embeddable, replicated and fault-tolerant SQL engine.
https://dqlite.io
Other
3.83k stars 216 forks source link

Implement server-side role management #480

Closed cole-miller closed 1 year ago

cole-miller commented 1 year ago

This is still rough and most error handling is stubbed out, but I think it represents a workable approach to moving role management onto the server.

Signed-off-by: Cole Miller cole.miller@canonical.com

codecov[bot] commented 1 year ago

Codecov Report

Merging #480 (ef1599a) into master (19a7924) will increase coverage by 1.96%. The diff coverage is 67.14%.

@@            Coverage Diff             @@
##           master     #480      +/-   ##
==========================================
+ Coverage   59.01%   60.97%   +1.96%     
==========================================
  Files          33       34       +1     
  Lines        5929     6340     +411     
  Branches     1764     1884     +120     
==========================================
+ Hits         3499     3866     +367     
+ Misses       1363     1293      -70     
- Partials     1067     1181     +114     
Impacted Files Coverage Δ
src/transport.c 51.29% <50.00%> (+5.19%) :arrow_up:
src/roles.c 63.58% <63.58%> (ø)
src/server.c 55.55% <84.84%> (+3.84%) :arrow_up:
src/client/protocol.c 42.66% <100.00%> (+9.16%) :arrow_up:
src/config.c 92.00% <100.00%> (+0.69%) :arrow_up:

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

cole-miller commented 1 year ago

I'm not entirely satisfied with this branch yet, but I think it's ready for some other eyes on it. Jepsen seems mostly happy, including the roles checker -- there is an "extra online spare" that I'm looking into, but it doesn't occur very often (about once per CI run) and I've also seen it sporadically with the go-dqlite client-side role management.

I've written some unit tests for the role assignment algorithm, but still want to add more of those, plus a couple of integration tests once I figure out how to set that up.

cole-miller commented 1 year ago

I'm wondering whether the assertion failure inside uv_close (resulting from an uninitialized handle) is a problem that only surfaces with specific versions of libuv. When I run the new integration test locally against libuv v1.44.3, it's not triggered. nope, it's my fault, failing to call uv_timer_init in all cases

freeekanayaka commented 1 year ago

I'm wondering whether the assertion failure inside uv_close (resulting from an uninitialized handle) is a problem that only surfaces with specific versions of libuv. When I run the new integration test locally against libuv v1.44.3, it's not triggered.

It's our code that is calling uv_close(), and that assertion seems to imply that the handle that we're passing to uv_close() is somehow screwed up (either uninitialized, as you point, or maybe with corrupted memory that was overwritten somewhere), so even if for some reason the problem doesn't surface in v1.44.3 it looks like a bug on our side.

cole-miller commented 1 year ago

Broken up into commits for easier review :)

cole-miller commented 1 year ago

I think there is still a pesky leak of allocations that should be cleaned up by RoleManagementDrain, but when I went to look at the GHA failure for the latest commit, it had turned into a success, hmm.

freeekanayaka commented 1 year ago

Broken up into commits for easier review :)

Much easier to review indeed!

cole-miller commented 1 year ago

Review comments addressed. I've left the commit removing #ifdef __APPLE__ in because: