Closed webbnh closed 1 year ago
I do wonder about the fallout of this extra failure mode on the Jenkins executors: the risk of leaving a service enabled, and the impact on a subsequent CI job scheduled on the same executor ...
Yep, I'm right there with you. And, I just checked, and the service is still there, "loaded" and "inactive (dead)" having exited successfully...I don't know whether that corresponds to "enabled" or "disabled"....
Nevertheless, I'm expecting/hoping that the --force
flag on the enable
command will overcome any resistance from this, so that subsequent jobs are unaffected. (My principal concern was that systemctl
would restart the Pbench Server and keep it running (I saw this in my test environment, much to my bemusement!)...I'm not sure whether I'm happy or unhappy to find that it didn't in the CI...perhaps this is the result of the lack of the "linger" option?)
is it really worthwhile and worth that "contamination" risk?
I have the same concern.
What we really want is an automatic service disable at logout of the CI job session (and even that would be a problem if we ever enable parallel jobs on an executor).
I'm hoping that I can add something to the end of the Jenkins pipeline file to effect that (I don't think we're ever going to enable more than one Jenkins job on a single executor...). 🤞
I do wonder about the fallout of this extra failure mode on the Jenkins executors: the risk of leaving a service enabled, and the impact on a subsequent CI job scheduled on the same executor ...
Yep, I'm right there with you. And, I just checked, and the service is still there, "loaded" and "inactive (dead)" having exited successfully...I don't know whether that corresponds to "enabled" or "disabled"....
Nevertheless, I'm expecting/hoping that the
--force
flag on theenable
command will overcome any resistance from this, so that subsequent jobs are unaffected. (My principal concern was thatsystemctl
would restart the Pbench Server and keep it running (I saw this in my test environment, much to my bemusement!)...I'm not sure whether I'm happy or unhappy to find that it didn't in the CI...perhaps this is the result of the lack of the "linger" option?)
Yeah; without "linger" the service should be killed once the user no longer has any active login session. Which is OK so long as we can't have overlapping login sessions we don't really want to be continuous. Since we have the executors single-streamed, that's probably not a problem unless someone with executor access leaves an interactive session logged in across the end of a CI job! (But if that happens ...)
And while --runtime
might serve as a "last ditch blade guard", it's pretty far downstream since the executors don't restart often. We'll be breaking CI jobs with obsolete containers claiming ports long before we get there. To the point where I don't see how it can be considered helpful at all. It just seems to complicate the model.
is it really worthwhile and worth that "contamination" risk?
I have the same concern.
What we really want is an automatic service disable at logout of the CI job session (and even that would be a problem if we ever enable parallel jobs on an executor).
I'm hoping that I can add something to the end of the Jenkins pipeline file to effect that (I don't think we're ever going to enable more than one Jenkins job on a single executor...). 🤞
There's an interesting distinction between "stop" and "disable" ... without "linger" the service is supposed to shut down when there are no active user sessions, but I think the service remains registered and will restart automatically on the next login. This is probably what you observed. What we want is probably more like a non-linger start
without an enable
, which I think would mean that even if it stays "known" it wouldn't re-start automatically on login.
Ultimately, for real deployment we want to enable
a service, but for testing and CI I'm not sure we want to do more than start
and stop
. Which brings into question whether "consolidating" on using systemd
even makes sense. They're not going to be the same, because in testing the contamination of retaining a service registration past the session is "evil" whereas for deployment it's "essential"...
that's probably not a problem unless someone with executor access leaves an interactive session logged in across the end of a CI job!
Ha! That's an angle that I hadn't considered.... 😨
while --runtime might serve as a "last ditch blade guard", it's pretty far downstream since the executors don't restart often.
That's a fair point.
We'll be breaking CI jobs with obsolete containers claiming ports long before we get there.
I don't think that that is a problem: given that we're running the Pbench Server via the pbench-server
service, to the extent that there is a conflict, it should show up at the service enablement...we shouldn't get far enough that we start a second instance of the server, so there should never be port conflicts. (And, I'm hoping that the use of --force
on the enable
command will help resolve any conflicts favorably.)
I think the service remains registered and will restart automatically on the next login.
Aside from the fact that I'm unclear on whether Jenkins jobs count as "logins", I just demonstrated that logging in interactively after a job exits does not cause the service to restart automatically.
This is probably what you observed.
No...what I observed on my test machine was the systemctl disable
command failing followed by the script issuing podman
commands to (successfully) kill the container followed by the container miraculously reappearing (presumably having been restarted by systemctl
when it noticed the container stopping). That is, everything seemed to be working as intended, even if it was not what I mistakenly expected at the time (i.e., I didn't realize that the systemctl
command had failed, but, armed with that knowledge, the result is not surprising).
What we want is probably more like a non-linger
start
without anenable
Although the man page seems to indicate that one can use start
without a prior enable
, I couldn't figure out how to actually do it. (If you've got a clue, I would love to hear it, as I would prefer to skip the enable in the func-test case....)
OK, so the tests passed this time, which included successfully superseding the systemctl
configuration left over from previous run, and the output looks eminently reasonable. So, I think the present approach will work.
I'll see if I can refactor deploy
and pry the redeploy
actions out of it. I'm on the fence but unenthusiastic about creating an undeploy
script, though...@dbutenhof, you want to nudge me either way?
OK, so the tests passed this time, which included successfully superseding the
systemctl
configuration left over from previous run, and the output looks eminently reasonable. So, I think the present approach will work.I'll see if I can refactor
deploy
and pry theredeploy
actions out of it. I'm on the fence but unenthusiastic about creating anundeploy
script, though...@dbutenhof, you want to nudge me either way?
I'm entirely on the fence; unless someone nudges me one way or the other I'll be unable to decide whether I have any real interest in nudging you. I thought having both as scripts would be consistent, but I'm not really sure how or why we'd use an undeploy separately. (I do however like your idea of pulling the dashboard out into a "redeploy" subsidiary, because that would be handy. We could even put it into a service file to automatically redeploy on a regular basis... although a Jenkins or hosted GitHub CD action would probably be a better choice.)
OK, this is now ready for review.
Since we use
systemctl
to run the containerized Pbench Server in the Staging deployment (and since we'll do the same in Production), we should use the same mechanism when running the Server for functional test or development.This PR modifies the
server/pbenchinacan/deploy
script to generate a Systemd unit configuration file and enable the service to start the canned Pbench Server. And, it modifies the cleanup code injenkins/run-server-func-tests
to usesystemctl
commands to shut it down again.This PR is split into multiple commits: the first is just a lint pass over the two modules; the others contain the actual changes. (They'll be squashed together for the actual merge.)
PBENCH-1116