brooklyncentral / clocker

Apache Brooklyn cloud native infrastructure blueprints
Apache License 2.0
428 stars 66 forks source link

Create-container failed while resizing up docker-cluster: "IllegalStateException: SDN agent entity" for host that is starting #300

Open aledsage opened 8 years ago

aledsage commented 8 years ago

While the docker-infrastructure cluster was resizing up (i.e. a new Docker Host was being added), I tried to deploy the "Node.js Demo Application" from the default catalog. It gave the error below:

Failed after 4ms: Error invoking start at NodeJsWebAppServiceImpl{id=Iogz6mrf}: SDN agent entity on DockerHostImpl{id=hHFWt6Q0} is null

org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException: Error invoking start at NodeJsWebAppServiceImpl{id=Iogz6mrf}: SDN agent entity on DockerHostImpl{id=hHFWt6Q0} is null
    at org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException.propagate(EffectorUtils.java:299)
    at org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException.access$100(EffectorUtils.java:266)
    at org.apache.brooklyn.core.mgmt.internal.EffectorUtils.handleEffectorException(EffectorUtils.java:306)
    at org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$2.handleException(EffectorTasks.java:90)
    at org.apache.brooklyn.util.core.task.DynamicSequentialTask.handleException(DynamicSequentialTask.java:469)
    at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:417)
    at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: SDN agent entity on DockerHostImpl{id=hHFWt6Q0} is null
    at org.apache.brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:128)
    at org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:372)
    at org.apache.brooklyn.util.core.task.Tasks$2.get(Tasks.java:285)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.preStartAtMachineAsync(MachineLifecycleEffectorTasks.java:412)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.startInLocation(MachineLifecycleEffectorTasks.java:339)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.startInLocations(MachineLifecycleEffectorTasks.java:324)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.start(MachineLifecycleEffectorTasks.java:313)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StartEffectorBody.call(MachineLifecycleEffectorTasks.java:214)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StartEffectorBody.call(MachineLifecycleEffectorTasks.java:201)
    at org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$1.call(EffectorTasks.java:82)
    at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
    ... 5 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: SDN agent entity on DockerHostImpl{id=hHFWt6Q0} is null
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
    at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
    at org.apache.brooklyn.util.core.task.BasicTask.getUnchecked(BasicTask.java:370)
    ... 14 more
Caused by: java.lang.IllegalStateException: SDN agent entity on DockerHostImpl{id=hHFWt6Q0} is null
    at clocker.docker.location.DockerHostLocation.obtain(DockerHostLocation.java:221)
    at clocker.docker.location.DockerLocation.obtain(DockerLocation.java:270)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ObtainLocationTask.call(MachineLifecycleEffectorTasks.java:406)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ObtainLocationTask.call(MachineLifecycleEffectorTasks.java:396)
    at org.apache.brooklyn.util.core.task.Tasks.withBlockingDetails(Tasks.java:98)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisionMachineTask.call(MachineLifecycleEffectorTasks.java:380)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisionMachineTask.call(MachineLifecycleEffectorTasks.java:364)
    ... 6 more

DockerHostImpl{id=hHFWt6Q0} is indeed the host that was still starting up.

New hosts that are starting (or old hosts that are being stopping) should be excluded from the choice of hosts on which to create the container.

grkvlt commented 8 years ago

Odd, because we do a Entities.waitForServiceUp(dockerHost) before calling obtain() on it (for exactly this reason) so the SDN agent ought to be there by then.

aledsage commented 8 years ago

The error could be related to https://github.com/brooklyncentral/clocker/issues/301. The host that it was trying to deploy onto subsequently failed to tart with the error described in that bug report.


Testing this again, it does seem to be waiting for the entity to start up. The activity shows the stacktrace below for where it is waiting:

Waiting for SERVICE_UP on DockerHostImpl{id=qP9Do9z6}

Task[provisioning (DockerLocation:rJtz)]@KHf8lRPF
Submitted by SoftlyPresent[value=Task[start]@NZdRzNMv]

In progress, thread waiting (timed) on unknown (sleep)
At: org.apache.brooklyn.util.time.Time.sleep(Time.java:353)
    org.apache.brooklyn.util.time.Time.sleep(Time.java:361)
    org.apache.brooklyn.util.repeat.Repeater.runKeepingError(Repeater.java:375)
    org.apache.brooklyn.util.repeat.Repeater.run(Repeater.java:298)
    org.apache.brooklyn.core.entity.Entities.waitForServiceUp(Entities.java:1147)
    org.apache.brooklyn.core.entity.Entities.waitForServiceUp(Entities.java:1166)
    clocker.docker.location.DockerLocation.obtain(DockerLocation.java:264)
    org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ObtainLocationTask.call(MachineLifecycleEffectorTasks.java:406)
    org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ObtainLocationTask.call(MachineLifecycleEffectorTasks.java:396)
    org.apache.brooklyn.util.core.task.Tasks.withBlockingDetails(Tasks.java:98)
    org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisionMachineTask.call(MachineLifecycleEffectorTasks.java:380)
    org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$ProvisionMachineTask.call(MachineLifecycleEffectorTasks.java:364)
    org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
    org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)