brooklyncentral / brooklyn

This project has moved and is now part of the ASF
https://github.com/apache/incubator-brooklyn
72 stars 27 forks source link

sshj retry broken: each retry did not reconnect #357

Open aledsage opened 11 years ago

aledsage commented 11 years ago

My ssh command (using SshjTool) failed, despite previous and subsequent commands to the same ip:port succeeding.

The pertinent lines from the stacktrace are:

    at brooklyn.util.internal.ssh.SshjClientConnection.clear(SshjClientConnection.java:154)
    at brooklyn.util.internal.ssh.SshjTool.disconnect(SshjTool.java:324)
    at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:549)
    ... 46 more
Caused by: net.schmizz.sshj.transport.TransportException: [BY_APPLICATION] Disconnected
    ... 53 more

On each retry, the attempt to first disconnect (before we create a new connection) is failng because it is already disconnected. Each retry attempt was therefore propagating the exception without even trying to connect again!

2012-10-17 10:04:22,867 WARN brooklyn.util.task.BasicExecutionManager: Error while running task Taskreload [EFFECTOR, MyCustomerNginxController[id=kcE4sXqv,displayName=MyCustomerNginxController:kcE4]]; eMgTl3XQ3C: Execution failed, invocation error for check-running MyCustomerNginxController[id=kcE4sXqv,displayName=MyCustomerNginxController:kcE4]: (nginx@5.79.20.46:22) (nginx@5.79.20.46:22) error acquiring Put(path=[/tmp/brooklyn-1350464656310-kE8pWGYX.sh]) (out of retries - max 4):net.schmizz.sshj.connection.ConnectionException: Disconnected java.lang.IllegalStateException: Execution failed, invocation error for check-running MyCustomerNginxController[id=kcE4sXqv,displayName=MyCustomerNginxController:kcE4]: (nginx@5.79.20.46:22) (nginx@5.79.20.46:22) error acquiring Put(path=[/tmp/brooklyn-1350464656310-kE8pWGYX.sh]) (out of retries - max 4):net.schmizz.sshj.connection.ConnectionException: Disconnected at brooklyn.entity.basic.lifecycle.ScriptHelper.execute(ScriptHelper.java:209) at brooklyn.entity.proxy.nginx.NginxSshDriver.isRunning(NginxSshDriver.java:169) at com.mycustomer.nginx.MyCustomerNginxSshDriver.launch(MyCustomerNginxSshDriver.java:73) at brooklyn.entity.proxy.nginx.NginxSshDriver.restart(NginxSshDriver.java:201) at com.mycustomer.nginx.MyCustomerNginxSshDriver.reload(MyCustomerNginxSshDriver.java:105) at com.mycustomer.nginx.MyCustomerNginxSshDriver$reload.call(Unknown Source) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112) at brooklyn.entity.proxy.nginx.NginxController.reload(NginxController.groovy:103) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1047) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877) at groovy.lang.DelegatingMetaClass.invokeMethod(DelegatingMetaClass.java:149) at groovy.lang.MetaObjectProtocol$invokeMethod.call(Unknown Source) at brooklyn.util.GroovyJavaMethods.invokeMethodOnMetaClass(GroovyJavaMethods.groovy:123) at brooklyn.management.internal.AbstractManagementContext.invokeEffectorMethodLocal(AbstractManagementContext.java:207) at brooklyn.management.internal.AbstractManagementContext.invokeEffectorMethodSync(AbstractManagementContext.java:260) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoCachedMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:230) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124) at brooklyn.entity.basic.AbstractEntity.invokeMethod(AbstractEntity.groovy:723) at brooklyn.entity.basic.MethodEffector.call(MethodEffector.java:90) at brooklyn.management.internal.AbstractManagementContext$5.call(AbstractManagementContext.java:199) at brooklyn.util.task.BasicExecutionManager$3.call(BasicExecutionManager.java:279) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: brooklyn.util.internal.ssh.SshException: (nginx@5.79.20.46:22) (nginx@5.79.20.46:22) error acquiring Put(path=[/tmp/brooklyn-1350464656310-kE8pWGYX.sh]) (out of retries - max 4):net.schmizz.sshj.connection.ConnectionException: Disconnected at brooklyn.util.internal.ssh.SshjTool.propagate(SshjTool.java:711) at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:555) at brooklyn.util.internal.ssh.SshjTool.createFile(SshjTool.java:381) at brooklyn.util.internal.ssh.SshjTool.createFile(SshjTool.java:352) at brooklyn.util.internal.ssh.SshjTool.execScript(SshjTool.java:434) at brooklyn.location.basic.SshMachineLocation$2.exec(SshMachineLocation.java:330) at brooklyn.location.basic.SshMachineLocation.execWithLogging(SshMachineLocation.java:390) at brooklyn.location.basic.SshMachineLocation.execScript(SshMachineLocation.java:328) at brooklyn.entity.basic.AbstractSoftwareProcessSshDriver.execute(AbstractSoftwareProcessSshDriver.java:99) at brooklyn.entity.basic.lifecycle.ScriptHelper.execute(ScriptHelper.java:205) ... 38 more Caused by: java.lang.RuntimeException: net.schmizz.sshj.connection.ConnectionException: Disconnected at com.google.common.base.Throwables.propagate(Throwables.java:160) at brooklyn.util.internal.ssh.SshjTool$PutFileAction.clear(SshjTool.java:653) at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:541) ... 46 more Caused by: net.schmizz.sshj.connection.ConnectionException: Disconnected at net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:32) at net.schmizz.sshj.connection.ConnectionException$1.chain(ConnectionException.java:26) at net.schmizz.concurrent.Promise.deliverError(Promise.java:95) at net.schmizz.concurrent.Event.deliverError(Event.java:72) at net.schmizz.concurrent.ErrorDeliveryUtil.alertEvents(ErrorDeliveryUtil.java:34) at net.schmizz.sshj.connection.channel.AbstractChannel.notifyError(AbstractChannel.java:241) at net.schmizz.sshj.connection.channel.direct.SessionChannel.notifyError(SessionChannel.java:249) at net.schmizz.sshj.common.ErrorNotifiable$Util.alertAll(ErrorNotifiable.java:35) at net.schmizz.sshj.connection.ConnectionImpl.notifyError(ConnectionImpl.java:250) at net.schmizz.sshj.transport.TransportImpl.disconnect(TransportImpl.java:386) at net.schmizz.sshj.transport.TransportImpl.disconnect(TransportImpl.java:377) at net.schmizz.sshj.transport.TransportImpl.disconnect(TransportImpl.java:372) at net.schmizz.sshj.SSHClient.disconnect(SSHClient.java:366) at brooklyn.util.internal.ssh.SshjClientConnection.clear(SshjClientConnection.java:154) at brooklyn.util.internal.ssh.SshjTool.disconnect(SshjTool.java:324) at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:549) ... 46 more Caused by: net.schmizz.sshj.transport.TransportException: [BY_APPLICATION] Disconnected ... 53 more

aledsage commented 11 years ago

Oops, I highlighted the wrong pertinent lines:

Caused by: java.lang.RuntimeException: net.schmizz.sshj.connection.ConnectionException: Disconnected
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at brooklyn.util.internal.ssh.SshjTool$PutFileAction.clear(SshjTool.java:653)
at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:541)
... 46 more

Looking at this now, to confirm and fix.

ahgittin commented 11 years ago

closing since it's fixed

aledsage commented 11 years ago

Re-opening; I think my fix broke it even more :-( I think the disconnect/reconnect logic needs more attention.

2012-10-19 18:09:53,054 WARN Error executing brooklyn.event.adapter.AbstractPollHelper$_activatePoll_closure3_closure6@116ceac () org.codehaus.groovy.runtime.InvokerInvocationException: java.lang.AssertionError at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:97) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877) at groovy.lang.Closure.call(Closure.java:412) at groovy.lang.Closure.call(Closure.java:406) at brooklyn.util.task.BasicExecutionManager$2$1.call(BasicExecutionManager.java:234) at brooklyn.util.task.BasicExecutionManager$3.call(BasicExecutionManager.java:279) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.AssertionError at net.schmizz.sshj.SSHClient.startSession(SSHClient.java:639) at net.schmizz.sshj.sftp.SFTPEngine.(SFTPEngine.java:61) at net.schmizz.sshj.sftp.SFTPEngine.(SFTPEngine.java:56) at net.schmizz.sshj.SSHClient.newSFTPClient(SSHClient.java:623) at brooklyn.util.internal.ssh.SshjTool$1.create(SshjTool.java:582) at brooklyn.util.internal.ssh.SshjTool$1.create(SshjTool.java:1) at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:543) at brooklyn.util.internal.ssh.SshjTool$PutFileAction.create(SshjTool.java:651) at brooklyn.util.internal.ssh.SshjTool$PutFileAction.create(SshjTool.java:1) at brooklyn.util.internal.ssh.SshjTool.acquire(SshjTool.java:543) at brooklyn.util.internal.ssh.SshjTool.createFile(SshjTool.java:381) at brooklyn.util.internal.ssh.SshjTool.createFile(SshjTool.java:352) at brooklyn.util.internal.ssh.SshjTool.execScript(SshjTool.java:434) at brooklyn.util.internal.ssh.SshjTool.execShell(SshjTool.java:389) at brooklyn.location.basic.SshMachineLocation.run(SshMachineLocation.java:195) at brooklyn.location.basic.SshMachineLocation.run(SshMachineLocation.java:175) at brooklyn.location.basic.SshMachineLocation$run.call(Unknown Source) at brooklyn.event.adapter.SshPollHelper.executePollOnSuccess(SshPollHelper.groovy:35)

aledsage commented 11 years ago

See https://github.com/shikhar/sshj/issues/89.

I've also submitted a pull request that guards against calling when the AssertionError would be thrown.