adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 100 forks source link

Investigate / clear up exceptions in jenkins system log #3514

Open sxa opened 2 months ago

sxa commented 2 months ago

A quick search on the recent ones (today and yesterday) can be done with grep -h '^[^\ ]*Exception' /var/log/jenkins/jenkins.log /var/log/jenkins/jenkins.log.1 | sort | uniq -c

The two largest culprits at the moment are

Based on this clearing up things from the Azure and Orka provisioning would help with bullets 1, 2, 4 with 3 being one specific job that I can delete and the final one is partially with orka (Noting that as of 1142 today 100% of those were orka)

sxa commented 2 months ago

I've deleted job 493 from bullet 3 - if anyone wants to look into it, here is a tarball with the job directory from it, plus the two at either side: windows-build-493.tar.gz

sxa commented 2 months ago

Azure cloud definition has been removed

sxa commented 2 months ago

Adding these which are also occurring:

Caused: java.io.IOException
java.lang.reflect.InaccessibleObjectException: Unable to make field private static final long java.nio.channels.ClosedChannelException.serialVersionUID accessible: module java.base does not "opens java.nio.channels" to unnamed module @6be968ce
Caused: java.lang.RuntimeException: Failed to serialize hudson.slaves.OfflineCause$ChannelTermination#cause for class hudson.slaves.OfflineCause$ChannelTermination
Caused: java.lang.RuntimeException: Failed to serialize hudson.model.Node#temporaryOfflineCause for class hudson.slaves.DumbSlave
javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
Caused: java.io.IOException
Caused: java.lang.reflect.InvocationTargetException

(Final three are fairly common but potentially related to the new Debian container that was done for https://github.com/adoptium/infrastructure/issues/3043 which seems to have been producing some exceptions - the others less common) Input length one had 1354 instances yesterday,

sxa commented 2 months ago

Running this again today, we have 18 occurences of lines like this, all on different build jobs: java.io.IOException: Cannot save actions for StepAtomNode[id=622, exec=CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]] for completed execution CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]: [org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl@2e4d12f0, org.jenkinsci.plugins.workflow.actions.TimingAction@4a163c4a, org.jenkinsci.plugins.workflow.support.actions.LogStorajava.io.IOException: Cannot save actions for StepAtomNode[id=622, exec=CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]] for completed execution CpsFlowExecution[Owner[build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin/517:build-scripts/jobs/jdk17u/jdk17u-mac-x64-temurin #517]]: [org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl@2e4d12f0, org.jenkinsci.plugins.workflow.actions.TimingAction@4a163c4a, org.jenkinsci.plugins.workflow.support.actions.LogStorageAction@c0a7d6a, org.jenkinsci.plugins.workflow.actions.LabelAction@1640e7d3, io.jenkins.blueocean.listeners.NodeDownstreamBuildAction@7d535a15] geAction@c0a7d6a, org.jenkinsci.plugins.workflow.actions.LabelAction@1640e7d3, io.jenkins.blueocean.listeners.NodeDownstreamBuildAction@7d535a15]

Other than that we have the following that are in double figures from yesterday's log:

Significantly fewer than before, and most of the remaining ones seem to be relating to the orka plugin