adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 100 forks source link

2024/05/09: Jenkins regular patching cycle - update to 2.440.3 #3548

Closed sxa closed 1 month ago

sxa commented 2 months ago

Process to follow: https://github.com/adoptium/infrastructure/blob/master/README.md#jenkins

Current version: 2.440.1 New version: 2.440.3 Jenkins event calendar

Previous patching issue: https://github.com/adoptium/infrastructure/issues/3376

Notes:

sxa commented 2 months ago

Noting that in the restart I did for https://github.com/adoptium/infrastructure/issues/3552 there were still quite a lot of exceptions in the log during startup despite it mostly running clean during normal operations with the work that's been done under https://github.com/adoptium/infrastructure/issues/3514. We may wish to look at some of these a little more closely during the maintenance outage to see if any of them can be cleared up.

As a point of reference, the restart cycle seemed to take around five minutes although the log suggests it might have been a bit quicker. The first one was done by a service jenkins start on the command line, the second was a restart initiated from within jenkins itself. Noting that when stopping the service it took (subjectively) about 2-3 minutes for the java process to shut down even though the service indicated it had been stopped, so it takes a bit of extra time to do the clean shutdown.

# egrep 'Beginning.extraction|fully.up' jenkins.log
2024-05-05 10:25:45.355+0000 [id=1] INFO    winstone.Logger#logInternal: Beginning extraction from war file
2024-05-05 10:28:34.810+0000 [id=26]    INFO    hudson.lifecycle.Lifecycle#onReady: Jenkins is fully up and running
2024-05-05 10:50:10.238+0000 [id=1] INFO    winstone.Logger#logInternal: Beginning extraction from war file
2024-05-05 10:52:48.440+0000 [id=26]    INFO    hudson.lifecycle.Lifecycle#onReady: Jenkins is fully up and running
sxa commented 2 months ago

Note to self - I was recommended the --sessionEviction parameter alongside sessionTimeout which may reduce the amount of times you have to refresh during the day. sessionEviction is an "idle timeout" so if you're not actively using jenkins it will affect you. I believe it defaults to 30 minutes (less than most meetings ;-) ) so we should look at an increase to 12 hours with --sessionEviction=43200 (126060) (or maybe a little less? 14400 would be four hours)

Note that we currently have --sessionTimeout=720 in /etc/default/jenkins which sets the login validity to 12 hours but that is separate from the idle one which can kick you out earlier. Also, confusingly the sessionEviction parameter is in seconds, not minutes so the value is not the same.

Ref: https://issues.jenkins.io/browse/JENKINS-51734

sxa commented 1 month ago

Plugins with breaking changes:

Other notes:

Messages of note in the logs:

Second restart to update the jenkins version was at 0843UTC Third restart to pick up --sessionEviction parameter

Noting that after the second restart (warm restart by jenkins itself) we had these messages in the log: 2024-05-09 08:46:19.007+0000 [id=67] SEVERE h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (SyncQueueListener/67) died unexpectedly due to an uncaught exception. This may leave your server corrupted and usually indicates a software bug. java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong.

We have NOT updated the following plugins due to them still being listed as incompatible:

The SSH credentials thing may have been introduced purely to force people to verify that they have looked at the notes about PuTTy keys which should not affect us so we may be able to push that update through without problems. We can look at that in the next cycle.

The following jobs were stopped as they were preventing jenkins from doing it's own restarts (likely because tagged pipelines were in progres) - I will re-iniitate them:

sxa commented 1 month ago

I'm going to close this off on the basis we'll do the next set of updates next month. I will note that there were a lot of exceptions on startup which it would be good to investigate and remediate if possible at some point.