d2iq-archive / marathon

Deploy and manage containers (including Docker) on top of Apache Mesos at scale.
https://mesosphere.github.io/marathon/
Apache License 2.0
4.07k stars 843 forks source link

Flaky Test: GroupDeployIntegrationTest #3648

Closed meichstedt closed 8 years ago

meichstedt commented 8 years ago

Sporadically fails. log here: https://teamcity.mesosphere.io/viewLog.html?buildId=189623&buildTypeId=Oss_Marathon_IntegrationScaleTests&tab=buildLog

[16:40:32]W:     [Step 3/3]  [31mGroupDeployIntegrationTest: [0m
[16:40:32]W:     [Step 3/3]
[16:40:32]W:     [Step 3/3]  [31m- A group with a running deployment can not be deleted without force *** FAILED *** (10 seconds, 52 milliseconds) [0m
[16:40:32]W:     [Step 3/3]  [31m  java.lang.IllegalArgumentException: could not parse as mesosphere.marathon.integration.facades.ITDeploymentResult: [0m
[16:40:32]W:     [Step 3/3]  [31m{ [0m
[16:40:32]W:     [Step 3/3]  [31m  "message" : "Futures timed out after [10000 milliseconds]" [0m
[16:40:32]W:     [Step 3/3]  [31m} [0m
[16:40:32]W:     [Step 3/3]  [31mErrors: [0m
[16:40:32]W:     [Step 3/3]  [31m(/version,List(ValidationError(List(error.path.missing),WrappedArray()))) [0m
[16:40:32]W:     [Step 3/3]  [31m(/deploymentId,List(ValidationError(List(error.path.missing),WrappedArray()))) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.setup.SprayHttpResponse$$anonfun$read$1$$anonfun$apply$2.apply(SprayHttpResponse.scala:14) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.setup.SprayHttpResponse$$anonfun$read$1$$anonfun$apply$2.apply(SprayHttpResponse.scala:11) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.setup.RestResult$$anonfun$map$1.apply(HttpServiceTestModule.scala:27) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.setup.RestResult.value$lzycompute(HttpServiceTestModule.scala:23) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.setup.RestResult.value(HttpServiceTestModule.scala:23) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.setup.MarathonCallbackTestSupport$class.waitForChange(MarathonCallbackTestSupport.scala:39) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest.waitForChange(GroupDeployIntegrationTest.scala:13) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest$$anonfun$14.apply$mcV$sp(GroupDeployIntegrationTest.scala:247) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest$$anonfun$14.apply(GroupDeployIntegrationTest.scala:227) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest$$anonfun$14.apply(GroupDeployIntegrationTest.scala:227) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.Transformer.apply(Transformer.scala:22) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.Transformer.apply(Transformer.scala:20) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:158) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.Suite$class.withFixture(Suite.scala:1121) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuite.withFixture(FunSuite.scala:1559) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:155) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:167) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:167) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest.org$scalatest$BeforeAndAfter$$super$runTest(GroupDeployIntegrationTest.scala:13) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest.runTest(GroupDeployIntegrationTest.scala:13) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:200) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [0m
[16:40:32] :     [Step 3/3] [info] Run completed in 5 minutes, 13 seconds.
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [0m
[16:40:32] :     [Step 3/3] [info] Total number of tests run: 105
[16:40:32]W:     [Step 3/3]  [31m  at scala.collection.immutable.List.foreach(List.scala:381) [0m
[16:40:32] :     [Step 3/3] [info] Suites: completed 157, aborted 1
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [0m
[16:40:32] :     [Step 3/3] [info] Tests: succeeded 104, failed 1, canceled 0, ignored 0, pending 0
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [0m
[16:40:32] :     [Step 3/3] [info] *** 1 SUITE ABORTED ***
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [0m
[16:40:32] :     [Step 3/3] [info] *** 1 TEST FAILED ***
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:200) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuite.runTests(FunSuite.scala:1559) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.Suite$class.run(Suite.scala:1423) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1559) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:204) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:204) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest.org$scalatest$BeforeAndAfterAllConfigMap$$super$run(GroupDeployIntegrationTest.scala:13) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.BeforeAndAfterAllConfigMap$class.liftedTree1$1(BeforeAndAfterAllConfigMap.scala:248) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.BeforeAndAfterAllConfigMap$class.run(BeforeAndAfterAllConfigMap.scala:247) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest.org$scalatest$BeforeAndAfter$$super$run(GroupDeployIntegrationTest.scala:13) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) [0m
[16:40:32]W:     [Step 3/3]  [31m  at mesosphere.marathon.integration.GroupDeployIntegrationTest.run(GroupDeployIntegrationTest.scala:13) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:444) [0m
[16:40:32]W:     [Step 3/3]  [31m  at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:651) [0m
[16:40:32]W:     [Step 3/3]  [31m  at sbt.ForkMain$Run$2.call(ForkMain.java:294) [0m
[16:40:32]W:     [Step 3/3]  [31m  at sbt.ForkMain$Run$2.call(ForkMain.java:284) [0m
[16:40:32] :     [Step 3/3] [error] Error during tests:
[16:40:32]W:     [Step 3/3]  [31m  at java.util.concurrent.FutureTask.run(FutureTask.java:266) [0m
[16:40:32] :     [Step 3/3] [error]     mesosphere.marathon.integration.GroupDeployIntegrationTest
[16:40:32]W:     [Step 3/3]  [31m  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [0m
[16:40:32]W:     [Step 3/3]  [31m  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [0m
[16:40:32]W:     [Step 3/3]  [31m  at java.lang.Thread.run(Thread.java:745) [0m
[16:40:32]W:     [Step 3/3]  [31mGroupDeployIntegrationTest: [0m
[16:40:32]W:     [Step 3/3]  [31m *** ABORTED *** (1 minute, 42 seconds) [0m
[16:40:32]W:     [Step 3/3]  [31m  java.lang.IllegalArgumentException: could not parse as mesosphere.marathon.integration.facades.ITDeploymentResult: [0m
[16:40:32]W:     [Step 3/3]  [31m{ [0m
[16:40:32]W:     [Step 3/3]  [31m  "message" : "Futures timed out after [10000 milliseconds]" [0m
kolloch commented 8 years ago

I reproduced this or a similar problem. Analysis:

Switched to Stop

marathon_8141: [2016-04-08 14:02:02,173] DEBUG [akka://marathon/user/launchQueue] received handled message Purge(/marathonintegrationtest/forcetest/app) in state active
marathon_8141: [2016-04-08 14:02:02,173] DEBUG [akka://marathon/user/launchQueue/1] received handled message Purge(/marathonintegrationtest/forcetest/app)
marathon_8141: [2016-04-08 14:02:02,173] DEBUG [akka://marathon/user/launchQueue/1/9-marathonintegrationtest_forcetest_app] received handled message Stop in state active
marathon_8141: [2016-04-08 14:02:02,173] INFO  [akka://marathon/user/launchQueue/1/9-marathonintegrationtest_forcetest_app] Stopping but still waiting for 2 in-flight messages, first three task ids: task [marathonintegrationtest_forcetest_app.b4374e67-fd81-11e5-837d-02421a2a8071], task [marathonintegrationtest_forcetest_app.b438d508-fd81-11e5-837d-02421a2a8071]

Task change notification that later times out

marathon_8141: [2016-04-08 14:02:02,173] DEBUG [akka://marathon/user/launchQueue] received handled message TaskChanged(LaunchEphemeral(LaunchedEphemeral(task [marathonintegrationtest_forcetest_app.b438d508-fd81-11e5-837d-02421a2a8071],AgentInfo(localhost,Some(7ded9c35-1b3c-4008-87de-b9266495a937-S0),Buffer()),2016-04-08T12:02:01.986Z,Status(2016-04-08T12:02:02.090Z,None,None),Vector(31937))),Update(LaunchedEphemeral(task [marathonintegrationtest_forcetest_app.b438d508-fd81-11e5-837d-02421a2a8071],AgentInfo(localhost,Some(7ded9c35-1b3c-4008-87de-b9266495a937-S0),Buffer()),2016-04-08T12:02:01.986Z,Status(2016-04-08T12:02:02.090Z,None,None),Vector(31937)),None)) in state active
marathon_8141: [2016-04-08 14:02:02,173] DEBUG [akka://marathon/user/launchQueue/1] received handled message TaskChanged(LaunchEphemeral(LaunchedEphemeral(task [marathonintegrationtest_forcetest_app.b438d508-fd81-11e5-837d-02421a2a8071],AgentInfo(localhost,Some(7ded9c35-1b3c-4008-87de-b9266495a937-S0),Buffer()),2016-04-08T12:02:01.986Z,Status(2016-04-08T12:02:02.090Z,None,None),Vector(31937))),Update(LaunchedEphemeral(task [marathonintegrationtest_forcetest_app.b438d508-fd81-11e5-837d-02421a2a8071],AgentInfo(localhost,Some(7ded9c35-1b3c-4008-87de-b9266495a937-S0),Buffer()),2016-04-08T12:02:01.986Z,Status(2016-04-08T12:02:02.090Z,None,None),Vector(31937)),None))

... gets deferred in LaunchQueueActor thus blocking the processing of the TaskChange which preceeds the TaskLaunchNotification that the ApptAskLaunchActor waits for (2 in-flight messages)

Ask timeout

marathon_8141: [2016-04-08 14:02:03,183] ERROR [] while executing step notifyLaunchQueue for [marathonintegrationtest_forcetest_app.b438d508-fd81-11e5-837d-02421a2a8071], continue with other steps
marathon_8141: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://marathon/user/launchQueue#758055736]] after [1000 ms]
marathon_8141:  at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:335) ~[akka-actor_2.11-2.3.9.jar:na]
marathon_8141:  at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[akka-actor_2.11-2.3.9.jar:na]
marathon_8141:  at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599) ~[scala-library-2.11.7.jar:na]
marathon_8141:  at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[scala-library-2.11.7.jar:na]
marathon_8141:  at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) ~[scala-library-2.11.7.jar:na]
marathon_8141:  at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) ~[akka-actor_2.11-2.3.9.jar:na]
marathon_8141:  at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) ~[akka-actor_2.11-2.3.9.jar:na]
marathon_8141:  at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) ~[akka-actor_2.11-2.3.9.jar:na]
marathon_8141:  at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) ~[akka-actor_2.11-2.3.9.jar:na]
marathon_8141:  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]