Idea originally from @nichmoe, discussed together with @frlan
Nowadays, there are
GitLab CI pipelines running batou deployments on e.g. a version update
(multiple) AppOps team members working on the same projects
Thus, there's a certain risk that two batou deployments are running simultaneously and conflict with each other.
To prevent that, some kind of locking mechanism would be nice.
The basic feature request is: write a PID file into the home of the service user. When I start a deployment and there's a PID file with a PID that exists, batou should abort the deployment.
A few more notes:
There are some multi-deployment RGs where we have >1 service user and >1 batou deployment. These are OK to run simultaneously because
system-wide changes on NixOS are already locked by fc-manage
everything else should be separated. I.e. two different deployments shouldn't touch the same file in the first place[1]
If batou fails in a way that it cannot clean up the PID file afterwards (e.g. connection loss), it can be checked if the PID actually exists.
a command to override (see borg break-lock for instance) may still be useful for exceptions.
If such a lock exists it may be interesting to also show who has started that deployment (i.e. "SSH login as service user?" -> probably the CI, SSH login as $human -> $human is currently deploying).
[1] This is still the case in at least one case (wpshared/varnish), but we consider this a bug (and there's a ticket to fix that by switching to Varnish multi-host).
Idea originally from @nichmoe, discussed together with @frlan Nowadays, there are
Thus, there's a certain risk that two batou deployments are running simultaneously and conflict with each other. To prevent that, some kind of locking mechanism would be nice.
The basic feature request is: write a PID file into the home of the service user. When I start a deployment and there's a PID file with a PID that exists, batou should abort the deployment.
A few more notes:
borg break-lock
for instance) may still be useful for exceptions.[1] This is still the case in at least one case (wpshared/varnish), but we consider this a bug (and there's a ticket to fix that by switching to Varnish multi-host).