Open farahfa opened 7 years ago
What jenkins version is used?
maven33-agent.jar already up to date maven33-interceptor.jar already up to date
Is it only fails for maven projects? Do you have any nat between jenkins master and containers? What docker daemon version is used? Swarm classic or swarm mode?
Jenkins is 2.46.1 (LTS)
It fails on all types of projects from time to time (I think there's some kind of race condition going on?), but very noticeable with Maven (throws errors mentioned in OP). Some jobs (for example the ones with RVM will hang at some point) and it does that randomly.
No NAT between master and containers
Docker version 1.11.2, build b9f10c9
Swarm classic
Imho classloading issues is core/remoting issue cc @oleg-nenashev I had locally issues with SystemProperties, but everything looks right and it not reproducible. And jenkins here connected via jnlp (or ssh in your case?) so docker plugin doesn't look like culprit.
Jenkins is connected using ssh... Hmm, this is a weird issue indeed. I'm trying to pin-point where the error is coming from, but I cannot tell exactly. :(
At least, I can rule out that it's not the docker plugin is not the problem.
Hitting this problem with the latest jenkins version and java jre 9 headless, it was working until i wiped the jobs , so i created a new job and then it starts failing
Xvfb stopping
FATAL: Remote call on docker-4b1f20c42f27 failed
java.lang.ClassNotFoundException: Classloading from system classloader disabled
at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch4(RemoteClassLoader.java:834)
at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch3(RemoteClassLoader.java:867)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:896)
at hudson.remoting.Request$2.run(Request.java:336)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:19)
at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to channel(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1554)
at hudson.remoting.Request.call(Request.java:172)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:260)
at com.sun.proxy.$Proxy6.fetch3(Unknown Source)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:195)
at java.lang.ClassLoader.loadClass(ClassLoader.java:486)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:294)
at hudson.util.ProcessTree$UnixReflection.
@jbarbera try to run on java8, i never saw such errors. Maybe they are related to jdk.
@KostyaSha , same issue ,, it was working well until i accidentally had deletete the jobs folder on jenkins server , problems started after i created a new single job
java.lang.ClassNotFoundException: Classloading from system classloader disabled
We experienced this problem around upgrading to Java 1.8. The error FATAL: Remote call on <HOST> failed
woud occur whenever the test process would spawn child processes that weren't successfully cleaned up by the end of test script.
The workaround would be to identify those background processes spawned by test, kill <pid>
them at the end of the test script, and sometimes (for good measure) add a sleep N
statement to sleep for a few seconds after a kill
so that the process would have time to shutdown after receiving the kill signal.
@mislav interesting... could you get snapshot of docker ps
tree command?
@KostyaSha Sorry, we've worked around the problem now and I don't have sample output from docker ps
anymore. But the issue ocurred even with child processes unrelated to Docker. Basically the workaround was:
kill <pid> # kill regular child process
docker kill <docker-id> # have docker kill a deamonized process
sleep 2 # allow some time for them to shut down
Probably docker should be run with init? but afaik jenkins in the end of build runs childRipper for killing all spawned childs.
@mislav re your work around, are those the steps one would add as a post-build hook to sh-exec?
if so, what
in my use case, jenkins master is running in a docker container and spawns build jobs that are in turn in containers. i am guessing the above post-build script would exec from the jenkins master container ... and if correct, i need to collect run results so that the job exits successfully.
thx for the work around notes!
In this issue i see 2 different errors, they are all related to remoting, but they are different.
@jwtodd
yet-another-docker-plugin
project—I should have noted that earlier. I just posted here because the exception message and stack traces are the same, and because it was hard for us to track down the cause of the failure. I hope it helps someone while debugging their issue.ok ... in our env all infra is running in a container, namely: jenkins and sonarqube
spawned build jobs run in a newly created docker container via YADPlugin ... thx for this btw :)
in u/g to jdk9 for build jobs only we have run into this same issue. interestingly we have 2 such jdk9 based containers, one for a prototypical java9 app build and a second one for nodejs builds ... each baselining from the same container to satisfy jenkins-slave concerns.
now, while we have not done alot with the nodejs build container other then prove it works with a shim/stub build exec sh ... it tears down cleanly whereas the java app with tests and all runs into the error stated here.
odd and a bummer :(
current work around is to run the jdk9 build on the jenkins master (container).
haven't had a chance to diagnose this one further ... but would love to push the job concerns back to build containers, for obvious reasons.
Do you have this project in public? How i can reproduce issue?
it isn't :(
i will see about publishing enough reference artifacts to reproduce this.
i did work on https://github.com/intuit/wasabi which shares some operational aspects of my current project ... i will see if that codebase run against my jenkins+YADP fails as well.
Hello,
Jenkins jobs (maven jobs) keep on failing from time to time (doesn't happen all the time) with the following error:
This only happens when Jenkins is provisioning slaves from a docker swarm, (if I change the docker URL to a single docker host then it works just fine). I am not sure where this problem is originating from, so I'm posting this here to see if it's something with the YADP or something else.
Any help is much appreciated.
P.S.: This also happens in other jobs where they just hang forever (seems to lose connection with the, docker swarm, slave I think).