EnMasseProject / enmasse

EnMasse - Self-service messaging on Kubernetes and OpenShift
https://enmasseproject.github.io
Apache License 2.0
190 stars 87 forks source link

Agent crashes when router pod is deleted #781

Closed lulf closed 6 years ago

lulf commented 6 years ago

Same observation as in #715 that the admin pod crashes when router pod crashes. In this case though, the router pod is just deleted manually.

The openshift logs for the agent container shows this error:

2018-01-23T09:21:52.367Z agent info retrieved link routes for Router.qdrouterd-2217381788-s3srf: raw-> [{"name":"override.lwt_in","identity":"4","type":"org.apache.qpid.dispatch.router.config.linkRoute","prefix":"$lwt","distribution":"linkBalanced","connection":null,"containerId":"lwt-service","dir":"in","operStatus":"inactive","pattern":null},{"name":"override.lwt_out","identity":"5","type":"org.apache.qpid.dispatch.router.config.linkRoute","prefix":"$lwt","distribution":"linkBalanced","connection":null,"containerId":"lwt-service","dir":"out","operStatus":"inactive","pattern":null},{"name":"override.locate","identity":"6","type":"org.apache.qpid.dispatch.router.config.linkRoute","prefix":"locate","distribution":"linkBalanced","connection":"subscription-service","containerId":null,"dir":"out","operStatus":"inactive","pattern":null}]
2018-01-23T09:21:52.367Z agent info retrieved link routes for Router.qdrouterd-2217381788-s3srf: {}
2018-01-23T09:21:52.367Z agent info updating addresses for Router.qdrouterd-2217381788-s3srf
2018-01-23T09:21:52.367Z agent info checking connectivity for Router.qdrouterd-2217381788-s3srf
2018-01-23T09:21:52.367Z agent info checking connectors on router Router.qdrouterd-2217381788-s3srf, missing=, stale=
2018-01-23T09:22:20.264Z agent info router ready
2018-01-23T09:22:20.264Z agent info router ready
events.js:160
      throw er; // Unhandled 'error' event
      ^
Error: Node not found
    at Sender.link.on_detach (/opt/app-root/src/node_modules/rhea/lib/link.js:146:86)
    at Session.on_detach (/opt/app-root/src/node_modules/rhea/lib/session.js:647:27)
    at Connection.(anonymous function) [as on_detach] (/opt/app-root/src/node_modules/rhea/lib/connection.js:646:30)
    at c.dispatch (/opt/app-root/src/node_modules/rhea/lib/types.js:902:33)
    at Transport.read (/opt/app-root/src/node_modules/rhea/lib/transport.js:95:36)
    at SaslClient.read (/opt/app-root/src/node_modules/rhea/lib/sasl.js:252:26)
    at Connection.input (/opt/app-root/src/node_modules/rhea/lib/connection.js:424:35)
    at emitOne (events.js:96:13)
    at TLSSocket.emit (events.js:188:7)
    at readableAddChunk (_stream_readable.js:176:18)
lulf commented 6 years ago

@grs Is this an error tha can be handled more gracefully? Is it enough to just handle on_detach?