Open askannon opened 10 years ago
What version of fabric is this?
fabric8-karaf-1.0.0.redhat-378
When the master address is not populated the DiscoveryTransport is not adding the new broker URL. Here the the failover log that works:
2014-05-09 10:20:52,807 | WARN | .164:58862@54577 | FailoverTransport | sport.failover.FailoverTransport 260 | 100 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610378 | Transport (tcp://esb1-4-vl.dfwx/10.20.2.164:58862@54577) failed, reason: , attempting to automatically reconnect
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)[:1.7.0_51]
at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:258)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at java.lang.Thread.run(Thread.java:744)[:1.7.0_51]
2014-05-09 10:20:56,020 | INFO | ZooKeeperGroup-0 | DiscoveryTransport | ort.discovery.DiscoveryTransport 78 | 100 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610378 | Adding new broker connection URL: tcp://esb1-5-vl.dfwx:54542
2014-05-09 10:21:03,116 | INFO | ActiveMQ Task-8 | FailoverTransport | sport.failover.FailoverTransport 1057 | 100 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610378 | Successfully reconnected to tcp://esb1-5-vl.dfwx:54542
and here is next one that doesn't work anymore:
2014-05-09 10:21:36,167 | WARN | .165:54542@56440 | FailoverTransport | sport.failover.FailoverTransport 260 | 100 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610378 | Transport (tcp://esb1-5-vl.dfwx/10.20.2.165:54542@56440) failed, reason: , attempting to automatically reconnect
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)[:1.7.0_51]
at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:258)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:221)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:213)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)[100:org.apache.activemq.activemq-osgi:5.9.0.redhat-610378]
at java.lang.Thread.run(Thread.java:744)[:1.7.0_51]
2014-05-09 10:21:41,281 | WARN | ActiveMQ Task-10 | FailoverTransport | sport.failover.FailoverTransport 1109 | 100 - org.apache.activemq.activemq-osgi - 5.9.0.redhat-610378 | Failed to connect to [] after: 10 attempt(s) continuing to retry.
btw just a heads up, fabric8-karaf-1.0.0.redhat-379 is the GA version of Fuse 6.1.
Hi,
update on this; we are seeing this issue with 379. Tends to happen when we have a set of bundles all connected to AMQ and there is an update to the features repository URL to rollout a new build e.g.
At this point contexts/routes are shutdown, new versions of the features are downloaded and the then the bundles are restarted.
the container routes try to reconnect to AMQ and the following errors occur (depending if we are using kaha or replicated leveldb configuration):
org.apache.activemq.activemq-osgi Failed to connect to [] after: 20 attempt(s) or org.apache.activemq.activemq-osgi - 5.9.0.redhat-610379 | Failed to connect to [nio://10.0.2.15:61616] after: 20 attempt(s)
The following is also seen in the AMQ log of the master broker when the container contexts/routes shutdown:
Transport Connection to: tcp://10.0.2.15:56783 failed: java.io.IOException: Broker BrokerService[broker1] is being stopped
Also of note is that according to JMX there is still a master AMQ node but the zookeeper registry has no replication details.
When we have kaha rather that leveldb configured we the null address as above.
regards Stan
I have a 5 node replicated mq cluster. Everything seems to be working fine. But when I stop/start the master nodes to trigger failover it sometimes happens that the address in the ZK registry under /fabric/registry/clusters/fusemq-replication-elections/dfwx1/000000000123 shows this:
id broker1 container dfwx1-broker1-3 address null position -1 weight 1 elected 0000000123
And on the camel amq endpoints I get this: org.apache.activemq.transport.failover.FailoverTransport: Failed to connect to [] after: 10 attempt(s) continuing to retry.
To fix this I have to stop all brokers in the cluster and restart them to get the address field populated for the master again.