KostyaSha / yet-another-docker-plugin

Jenkins Yet Another Docker Plugin
https://plugins.jenkins.io/yet-another-docker-plugin
MIT License
83 stars 48 forks source link

Remote call failed #142

Open farahfa opened 7 years ago

farahfa commented 7 years ago

Hello,

Jenkins jobs (maven jobs) keep on failing from time to time (doesn't happen all the time) with the following error:

Modules changed, recalculating dependency graph
Established TCP socket on 37493
maven33-agent.jar already up to date
maven33-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[Build.eng-idm.release.tif-sso] $ java -cp /var/lib/jenkins/maven33-agent.jar:/usr/share/maven/boot/plexus-classworlds-2.5.2.jar:/usr/share/maven/conf/logging jenkins.maven3.agent.Maven33Main /usr/share/maven /var/lib/jenkins/slave.jar /var/lib/jenkins/maven33-interceptor.jar /var/lib/jenkins/maven3-interceptor-commons.jar 37493
ERROR: Failed to parse POMs
java.io.IOException: Remote call on Docker-4b5a77361472 failed
    at hudson.remoting.Channel.call(Channel.java:838)
    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
    at hudson.maven.$Proxy91.accept(Unknown Source)
    at hudson.maven.AbstractMavenProcessFactory.newProcess(AbstractMavenProcessFactory.java:282)
    at hudson.maven.ProcessCache.get(ProcessCache.java:236)
    at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:798)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
    at hudson.model.Run.execute(Run.java:1728)
    at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:544)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:405)
Caused by: java.lang.LinkageError: Failed to load hudson.remoting.Pipe$ConnectCommand
    at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:377)
    at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:285)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at hudson.maven.AbstractMavenProcessFactory$Connection.writeReplace(AbstractMavenProcessFactory.java:163)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1118)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at hudson.remoting.UserRequest._serialize(UserRequest.java:190)
    at hudson.remoting.UserRequest.serialize(UserRequest.java:199)
    at hudson.remoting.UserRequest.perform(UserRequest.java:161)
    at hudson.remoting.UserRequest.perform(UserRequest.java:50)
    at hudson.remoting.Request$2.run(Request.java:336)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    at ......remote call to Docker-4b5a77361472(Native Method)
    at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1545)
    at hudson.remoting.UserResponse.retrieve(UserRequest.java:253)
    at hudson.remoting.Channel.call(Channel.java:830)
    ... 10 more
Caused by: java.lang.IllegalAccessError: class hudson.remoting.Pipe$ConnectCommand cannot access its superclass hudson.remoting.Command
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
    at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:373)
    at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:285)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at hudson.maven.AbstractMavenProcessFactory$Connection.writeReplace(AbstractMavenProcessFactory.java:163)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1118)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at hudson.remoting.UserRequest._serialize(UserRequest.java:190)
    at hudson.remoting.UserRequest.serialize(UserRequest.java:199)
    at hudson.remoting.UserRequest.perform(UserRequest.java:161)
    at hudson.remoting.UserRequest.perform(UserRequest.java:50)
    at hudson.remoting.Request$2.run(Request.java:336)
    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[YAD-PLUGIN] Injecting DOCKER_CONTAINER_ID variable.
[YAD-PLUGIN] Injecting JENKINS_CLOUD_ID variable.
[YAD-PLUGIN] DOCKER_HOST variable.
Finished: FAILURE

This only happens when Jenkins is provisioning slaves from a docker swarm, (if I change the docker URL to a single docker host then it works just fine). I am not sure where this problem is originating from, so I'm posting this here to see if it's something with the YADP or something else.

Any help is much appreciated.

P.S.: This also happens in other jobs where they just hang forever (seems to lose connection with the, docker swarm, slave I think).

KostyaSha commented 7 years ago

What jenkins version is used?

maven33-agent.jar already up to date maven33-interceptor.jar already up to date

Is it only fails for maven projects? Do you have any nat between jenkins master and containers? What docker daemon version is used? Swarm classic or swarm mode?

farahfa commented 7 years ago
KostyaSha commented 7 years ago

Imho classloading issues is core/remoting issue cc @oleg-nenashev I had locally issues with SystemProperties, but everything looks right and it not reproducible. And jenkins here connected via jnlp (or ssh in your case?) so docker plugin doesn't look like culprit.

farahfa commented 7 years ago

Jenkins is connected using ssh... Hmm, this is a weird issue indeed. I'm trying to pin-point where the error is coming from, but I cannot tell exactly. :(

At least, I can rule out that it's not the docker plugin is not the problem.

jbarbera commented 7 years ago

Hitting this problem with the latest jenkins version and java jre 9 headless, it was working until i wiped the jobs , so i created a new job and then it starts failing Xvfb stopping FATAL: Remote call on docker-4b1f20c42f27 failed java.lang.ClassNotFoundException: Classloading from system classloader disabled at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch4(RemoteClassLoader.java:834) at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch3(RemoteClassLoader.java:867) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:896) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:19) at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) at ......remote call to channel(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1554) at hudson.remoting.Request.call(Request.java:172) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:260) at com.sun.proxy.$Proxy6.fetch3(Unknown Source) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:195) at java.lang.ClassLoader.loadClass(ClassLoader.java:486) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:294) at hudson.util.ProcessTree$UnixReflection.(ProcessTree.java:699) at hudson.util.ProcessTree$UnixProcess.kill(ProcessTree.java:647) at hudson.util.ProcessTree$UnixProcess.killRecursively(ProcessTree.java:668) at hudson.util.ProcessTree$Unix.killAll(ProcessTree.java:589) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1091) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1082) at hudson.remoting.UserRequest.perform(UserRequest.java:181) at hudson.remoting.UserRequest.perform(UserRequest.java:52) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:804) Caused: java.lang.LinkageError at hudson.util.ProcessTree$UnixReflection.(ProcessTree.java:710) at hudson.util.ProcessTree$UnixProcess.kill(ProcessTree.java:647) at hudson.util.ProcessTree$UnixProcess.killRecursively(ProcessTree.java:668) at hudson.util.ProcessTree$Unix.killAll(ProcessTree.java:589) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1091) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1082) at hudson.remoting.UserRequest.perform(UserRequest.java:181) at hudson.remoting.UserRequest.perform(UserRequest.java:52) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:804) at ......remote call to docker-4b1f20c42f27(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1554) at hudson.remoting.UserResponse.retrieve(UserRequest.java:281) at hudson.remoting.Channel.call(Channel.java:839) Caused: java.io.IOException: Remote call on docker-4b1f20c42f27 failed at hudson.remoting.Channel.call(Channel.java:847) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1079) at org.jenkinsci.plugins.xvfb.Xvfb.shutdownAndCleanup(Xvfb.java:327) at org.jenkinsci.plugins.xvfb.XvfbDisposer.tearDown(XvfbDisposer.java:52) at jenkins.tasks.SimpleBuildWrapper$EnvironmentWrapper.tearDown(SimpleBuildWrapper.java:175) at hudson.model.Build$BuildExecution.doRun(Build.java:174) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:496) at hudson.model.Run.execute(Run.java:1737) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:419) Finished: FAILURE

KostyaSha commented 7 years ago

@jbarbera try to run on java8, i never saw such errors. Maybe they are related to jdk.

jbarbera commented 7 years ago

@KostyaSha , same issue ,, it was working well until i accidentally had deletete the jobs folder on jenkins server , problems started after i created a new single job

KostyaSha commented 7 years ago

java.lang.ClassNotFoundException: Classloading from system classloader disabled

mislav commented 6 years ago

We experienced this problem around upgrading to Java 1.8. The error FATAL: Remote call on <HOST> failed woud occur whenever the test process would spawn child processes that weren't successfully cleaned up by the end of test script.

The workaround would be to identify those background processes spawned by test, kill <pid> them at the end of the test script, and sometimes (for good measure) add a sleep N statement to sleep for a few seconds after a kill so that the process would have time to shutdown after receiving the kill signal.

KostyaSha commented 6 years ago

@mislav interesting... could you get snapshot of docker ps tree command?

mislav commented 6 years ago

@KostyaSha Sorry, we've worked around the problem now and I don't have sample output from docker ps anymore. But the issue ocurred even with child processes unrelated to Docker. Basically the workaround was:

kill <pid> # kill regular child process
docker kill <docker-id> # have docker kill a deamonized process
sleep 2 # allow some time for them to shut down
KostyaSha commented 6 years ago

Probably docker should be run with init? but afaik jenkins in the end of build runs childRipper for killing all spawned childs.

jwtodd commented 6 years ago

@mislav re your work around, are those the steps one would add as a post-build hook to sh-exec?

if so, what are you killing?

in my use case, jenkins master is running in a docker container and spawns build jobs that are in turn in containers. i am guessing the above post-build script would exec from the jenkins master container ... and if correct, i need to collect run results so that the job exits successfully.

thx for the work around notes!

KostyaSha commented 6 years ago

In this issue i see 2 different errors, they are all related to remoting, but they are different.

mislav commented 6 years ago

@jwtodd

  1. The steps are added to the test script, not the post-build hook.
  2. We run neither Jenkins master nor build jobs in docker containers. Docker is merely used to spawn some processes within the build script.
  3. I'm not sure whether our solution is related to the problem that the OP is describing. We do not use the yet-another-docker-plugin project—I should have noted that earlier. I just posted here because the exception message and stack traces are the same, and because it was hard for us to track down the cause of the failure. I hope it helps someone while debugging their issue.
jwtodd commented 6 years ago

ok ... in our env all infra is running in a container, namely: jenkins and sonarqube

spawned build jobs run in a newly created docker container via YADPlugin ... thx for this btw :)

in u/g to jdk9 for build jobs only we have run into this same issue. interestingly we have 2 such jdk9 based containers, one for a prototypical java9 app build and a second one for nodejs builds ... each baselining from the same container to satisfy jenkins-slave concerns.

now, while we have not done alot with the nodejs build container other then prove it works with a shim/stub build exec sh ... it tears down cleanly whereas the java app with tests and all runs into the error stated here.

odd and a bummer :(

current work around is to run the jdk9 build on the jenkins master (container).

haven't had a chance to diagnose this one further ... but would love to push the job concerns back to build containers, for obvious reasons.

KostyaSha commented 6 years ago

Do you have this project in public? How i can reproduce issue?

jwtodd commented 6 years ago

it isn't :(

i will see about publishing enough reference artifacts to reproduce this.

i did work on https://github.com/intuit/wasabi which shares some operational aspects of my current project ... i will see if that codebase run against my jenkins+YADP fails as well.