Closed sxa closed 1 month ago
Noting that in the restart I did for https://github.com/adoptium/infrastructure/issues/3552 there were still quite a lot of exceptions in the log during startup despite it mostly running clean during normal operations with the work that's been done under https://github.com/adoptium/infrastructure/issues/3514. We may wish to look at some of these a little more closely during the maintenance outage to see if any of them can be cleared up.
As a point of reference, the restart cycle seemed to take around five minutes although the log suggests it might have been a bit quicker. The first one was done by a service jenkins start
on the command line, the second was a restart initiated from within jenkins itself. Noting that when stopping the service it took (subjectively) about 2-3 minutes for the java process to shut down even though the service indicated it had been stopped, so it takes a bit of extra time to do the clean shutdown.
# egrep 'Beginning.extraction|fully.up' jenkins.log
2024-05-05 10:25:45.355+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
2024-05-05 10:28:34.810+0000 [id=26] INFO hudson.lifecycle.Lifecycle#onReady: Jenkins is fully up and running
2024-05-05 10:50:10.238+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file
2024-05-05 10:52:48.440+0000 [id=26] INFO hudson.lifecycle.Lifecycle#onReady: Jenkins is fully up and running
Note to self - I was recommended the --sessionEviction
parameter alongside sessionTimeout
which may reduce the amount of times you have to refresh during the day. sessionEviction is an "idle timeout" so if you're not actively using jenkins it will affect you. I believe it defaults to 30 minutes (less than most meetings ;-) ) so we should look at an increase to 12 hours with --sessionEviction=43200
(126060) (or maybe a little less? 14400 would be four hours)
Note that we currently have --sessionTimeout=720
in /etc/default/jenkins
which sets the login validity to 12 hours but that is separate from the idle one which can kick you out earlier. Also, confusingly the sessionEviction
parameter is in seconds, not minutes so the value is not the same.
Plugins with breaking changes:
Two new scopes are required when uploading files:
and no info on the update from 25 minutes agoOther notes:
Remove usage of obsolete trilead-putty key ([#200](https://github.com/jenkinsci/ssh-credentials-plugin/issues/200)) [@olamy](https://github.com/olamy)
The config filename changed (from thinBackup.xml to org.jvnet.hudson.plugins.thinbackup.ThinBackupPluginImpl.xml) and will be automatically converted after the first start with the new version. The new name complies with the newer naming convention, which was automatically adjusted because of the internal renovation. More details can be found: [#125](https://github.com/jenkinsci/thin-backup-plugin/pull/125)
Fail the entire build when a requested credential cannot be found ([#144](https://github.com/jenkinsci/ssh-agent-plugin/issues/144)) [@nattofriends](https://github.com/nattofriends)
Messages of note in the logs:
2024-05-09 08:35:31.342+0000 [id=54] INFO jenkins.model.RunIdMigrator#migrate: Migrating build records in /home/jenkins/.jenkins/jobs/Test_openjdk17_j9_sanity.functional_x86_64_windows_xl/builds
2024-05-09 08:35:37.965+0000 [id=56] WARNING o.j.p.m.AuthorizationContainer#add: Processing a permission assignment in the legacy format (without explicit TYPE prefix): hudson.model.Run.Update:AdoptOpenJDK*build-triage
java.io.FileNotFoundException: /home/jenkins/.jenkins/jobs/build-scripts/jobs/evaluation-openjdk21-pipeline/builds/125/program.dat (No such file or directory)
java.io.FileNotFoundException: /home/jenkins/.jenkins/jobs/build-scripts/jobs/openjdk21-pipeline/builds/276/program.dat (No such file or directory)
2024-05-09 08:38:36.100+0000 [id=62] WARNING o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[177:build]:Owner[build-scripts/openjdk11-pipeline/2698:build-scripts/openjdk11-pipeline #2698]
java.io.FileNotFoundException: /home/jenkins/.jenkins/jobs/build-scripts/jobs/openjdk11-pipeline/builds/2698/program.dat (No such file or directory)
2024-05-09 08:41:55.894+0000 [id=104] WARNING o.j.p.w.cps.CpsStepContext$2#onFailure: Failed to proceed after CpsStepContext[302:build]:Owner[build-scripts/jobs/jdk/jdk-windows-x64-temurin/345:build-scripts/jobs/jdk/jdk-windows-x64-temurin #345]
Second restart to update the jenkins version was at 0843UTC
Third restart to pick up --sessionEviction
parameter
Noting that after the second restart (warm restart by jenkins itself) we had these messages in the log:
2024-05-09 08:46:19.007+0000 [id=67] SEVERE h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (SyncQueueListener/67) died unexpectedly due to an uncaught exception. This may leave your server corrupted and usually indicates a software bug.
java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong.
We have NOT updated the following plugins due to them still being listed as incompatible:
The SSH credentials thing may have been introduced purely to force people to verify that they have looked at the notes about PuTTy keys which should not affect us so we may be able to push that update through without problems. We can look at that in the next cycle.
The following jobs were stopped as they were preventing jenkins from doing it's own restarts (likely because tagged pipelines were in progres) - I will re-iniitate them:
I'm going to close this off on the basis we'll do the next set of updates next month. I will note that there were a lot of exceptions on startup which it would be good to investigate and remediate if possible at some point.
Process to follow: https://github.com/adoptium/infrastructure/blob/master/README.md#jenkins
Current version: 2.440.1 New version: 2.440.3 Jenkins event calendar
Previous patching issue: https://github.com/adoptium/infrastructure/issues/3376
Notes: