adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

Track machines which are having jenkins connections issues #403

Closed sxa closed 4 years ago

sxa commented 6 years ago

I've had reports of three of our test machines having issues today with an error similar to the following showing in the logs:

Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to test-softlayer-ubuntu1604-x64-1
        at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
        at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
        at hudson.remoting.Channel.call(Channel.java:955)
        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
        at com.sun.proxy.$Proxy79.withRepository(Unknown Source)
        at org.jenkinsci.plugins.gitclient.RemoteGitImpl.withRepository(RemoteGitImpl.java:235)
        at hudson.plugins.git.GitSCM.printCommitMessageToLog(GitSCM.java:1271)
        at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1244)
        at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:113)
        at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85)
        at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75)
        at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
        at hudson.security.ACL.impersonate(ACL.java:290)
        at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.lang.NoClassDefFoundError: hudson.model.Computer
    at org.jenkinsci.plugins.gitclient.AbstractGitAPIImpl.withRepository(AbstractGitAPIImpl.java:29)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.withRepository(CliGitAPIImpl.java:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:929)
    at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:903)
    at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:855)
    at hudson.remoting.UserRequest.perform(UserRequest.java:212)
    at hudson.remoting.UserRequest.perform(UserRequest.java:54)
    at hudson.remoting.Request$2.run(Request.java:369)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
Caused: java.io.IOException: Remote call on test-softlayer-ubuntu1604-x64-1 failed
    at hudson.remoting.Channel.call(Channel.java:961)
    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
    at com.sun.proxy.$Proxy79.withRepository(Unknown Source)
    at org.jenkinsci.plugins.gitclient.RemoteGitImpl.withRepository(RemoteGitImpl.java:235)
    at hudson.plugins.git.GitSCM.printCommitMessageToLog(GitSCM.java:1271)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1244)
    at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:113)
    at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85)
    at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75)
    at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
    at hudson.security.ACL.impersonate(ACL.java:290)
    at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Finished: FAILURE

This has so far been reported on test-softlayer-ubuntu1604-x64-1 and test-marist-ubuntu1604-s390x-1 today. Both appear to have cleared with a disconnect / relaunch of the slave. Reason unclear at present (and is clearly not limited to any specific cloud provider) but keeping this issue around to track any more instances.

This had also happened previously test-osuosl-ppc64le-ubuntu-16-04-1 from June 7 to June 26.

sophia-guo commented 6 years ago

Relaunching the test-marist-ubuntu1604-s390x-1 only works for one time. After test job running on this machine once issue came back. build 37 the one worked right after relaunch

build 38 the second one ran into same issue.

sxa commented 6 years ago

Are the tests doing anything that would interfere with the jenkins process on the machine (which is a java process running as the same user as the tests)?

sophia-guo commented 6 years ago

Not sure about that. However tests haven't been changed recently and work good for a while. Any machine configuration changed? Like jenkins or plugin updates?

adamfarley commented 6 years ago

Update: I see this problem on test-softlayer-ubuntu1604-x64-1 (https://ci.adoptopenjdk.net/view/work%20in%20progress/job/Grinder/65/console)

But I do not see it on test-packet-ubuntu1604-x64-2 (https://ci.adoptopenjdk.net/view/work%20in%20progress/job/Grinder/66/console) even if I run it a second time on the same machine (https://ci.adoptopenjdk.net/view/work%20in%20progress/job/Grinder/67/console)

Is there a difference between the two machines that might provide clues?

sxa commented 6 years ago

I've bounced the jenkins agent on test-softlayer-ubuntu1604-x64-1 so we'll see if it recurs

sxa commented 4 years ago

Not seen any issues in this area recently, therefore closing to get this off the issues list