Open yufengJ opened 8 years ago
I ran into something very similar with Janitor: it died in an AWS API call with no log message. To trace it, I created a one-off JSP that invoked the same API call so I could get a full stack trace. In my case it was a version mismatch between the AWS client library in open source SimianArmy and a different AWS client jar that was being pulled in by our non-open source version.
I upgraded the AWS client to 1.11.9 and it resolved the issue for me. I have an open PR to introduce this to the main code line.
On Thu, Sep 8, 2016 at 6:03 PM, Yufeng notifications@github.com wrote:
Hi all,
I've observed that during BasicChaosMonkey.doMonkeyBusiness(), the method suddenly returned without finishing rest of it's happy-path. There's no exception nor error messages.
The jettyRun output is as follow:
2016-09-08 16:31:16.328 - INFO BasicChaosInstanceSelector - [BasicChaosInstanceSelector.java:65] Randomly selecting 1 from 3 instances, excluding null 2016-09-08 16:31:16.563 - INFO Monkey - [Monkey.java:138] Reporting what I did...
I've set up the debugger to trace this. The code end up into org.jclouds.ContextBuilde. The stack dump is:
"pool-1-thread-1@9515" prio=5 tid=0x1d nid=NA runnable java.lang.Thread.State: RUNNABLE at org.jclouds.ContextBuilder.buildView(ContextBuilder.java:588) at com.netflix.simianarmy.client.aws.AWSClient.getJcloudsComputeService(AWSClient.java:818)
- locked <0x2989> (a com.netflix.simianarmy.client.aws.AWSClient) at com.netflix.simianarmy.client.aws.AWSClient.connectSsh(AWSClient.java:834) at com.netflix.simianarmy.chaos.ChaosInstance.connectSsh(ChaosInstance.java:123) at com.netflix.simianarmy.chaos.ChaosInstance.canConnectSsh(ChaosInstance.java:101) at com.netflix.simianarmy.chaos.ScriptChaosType.canApply(ScriptChaosType.java:60) at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.pickChaosType(BasicChaosMonkey.java:141) at com.netflix.simianarmy.basic.chaos.BasicChaosMonkey.doMonkeyBusiness(BasicChaosMonkey.java:121) at com.netflix.simianarmy.Monkey.run(Monkey.java:134) at com.netflix.simianarmy.Monkey$1.run(Monkey.java:155) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
I've observed the issue on master branch and tag v2.5.1. Tag v2.5.0 is fine though and I was using it well. So i am suspecting it's because some dependency changes in between that is causing this. However a diff between build.gradle of different tags shows me that jcloud is not upgraded during these two tags. So i am confused as where to go next.
$ diff master_branch/build.gradle tag_v2.5.0/build.gradle 1,6d0 < buildscript { < repositories { < jcenter() < } < } < 8c2
< id 'nebula.netflixoss' version '3.2.3'
id 'nebula.netflixoss' version '2.2.9'
18c12
< repositories {
repositories { 26,28d19 < sourceCompatibility = 1.7 < targetCompatibility = 1.7 < 36c27,28
< compile 'com.sun.jersey:jersey-servlet:1.19'
compile 'com.sun.jersey:jersey-core:1.11' compile 'com.sun.jersey:jersey-servlet:1.11'
40c32,34
< compile 'com.netflix.eureka:eureka-client:1.4.1'
compile('com.netflix.eureka:eureka-client:1.1.22') { exclude group: 'com.sun.jersey', module: 'jersey-bundle' }
49a44 compile 'ch.qos.logback:logback-classic:1.0.13' 51,52d45 < compile 'org.springframework:spring-jdbc:4.2.5.RELEASE' < compile 'com.zaxxer:HikariCP:2.4.7'
I might dig deeper into this. Has anyone got this issue before?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Netflix/SimianArmy/issues/274, or mute the thread https://github.com/notifications/unsubscribe-auth/AKXxgfImWXScybt-Yx2W8lvb6gm0po5Pks5qoLBbgaJpZM4J4mfs .
Thanks for suggestions! It turned out it's the same issue as https://github.com/Netflix/SimianArmy/issues/259.
Problem was fixed by fixing the dependency
compile ('com.netflix.eureka:eureka-client:1.4.1') {
exclude group: 'com.google.inject'
}
Nice! I ran into this just recently and the dependency exclusion also solved the issue for me
Thanks!
Hi all,
I've observed that during BasicChaosMonkey.doMonkeyBusiness(), the method suddenly returned without finishing rest of it's happy-path. There's no exception nor error messages.
The jettyRun output is as follow:
I've set up the debugger to trace this. The code end up into org.jclouds.ContextBuilde. The stack dump is:
I've observed the issue on master branch and tag v2.5.1. Tag v2.5.0 is fine though and I was using it well. So i am suspecting it's because some dependency changes in between that is causing this. However a diff between build.gradle of different tags shows me that jcloud is not upgraded during these two tags. So i am confused as where to go next.
I might dig deeper into this. Has anyone got this issue before?