Closed cole-miller closed 5 months ago
Attention: Patch coverage is 82.69231%
with 9 lines
in your changes missing coverage. Please review.
Project coverage is 77.42%. Comparing base (
6633bd8
) to head (294464b
). Report is 29 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
src/server.c | 73.52% | 7 Missing and 2 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
cc @tomponline
Thanks for this @cole-miller
Once this is merged it'll be auto integrated on next build into LXD's latest/edge channel, and then tested through our CI as well as our daily snap CI (https://github.com/canonical/lxd-ci/actions/workflows/tests.yml?query=event%3Aschedule)
So we'll see if there are any breakages then.
Also thanks for checking if LXD checks for existing processes. I suppose it is possible potentially for LXD's unix socket to have been closed before the DB is closed and thus allow a new LXD process to start and open the DB concurrently - ill double check this our side.
@cole-miller seems like LXD does correctly shutdown its local unix listener after closing the DB:
https://github.com/canonical/lxd/blob/main/lxd/daemon.go#L1899-L1917
Thanks @tomponline, in that case it's unlikely that concurrent use of the directory is the issue for LXD. I should have a PR up soon that adds the additional instrumentation we've discussed, so if/when the problem recurs we can understand it better. In the meantime I'm inclined to land this change (pending review) since it eliminates a footgun for users of dqlite that don't have their own exclusiveness checking.
It was suggested that some of the corruption issues reported by LXD users might be due to two LXD daemon processes running concurrently, causing two dqlite instances to concurrently modify the common data directory. LXD already has some mutual exclusion logic to prevent two daemons from running at the same time:
https://github.com/canonical/lxd/blob/1514a400f11a82b90a5294b5c1f31cd2c6dd9311/lxd/endpoints/socket.go#L39
But even so it seems worth it to do some file locking to prevent this definitively on the dqlite side.
Note that this obviously won't work unless both the contending processes are running versions of dqlite that include the locking.
Signed-off-by: Cole Miller cole.miller@canonical.com