Closed nkvoll closed 10 years ago
I should add that the 1.4 instance discovers the 1.3.4-node using a separate UnicastHostsProvider
, which creates a discovery node similar this:
new DiscoveryNode("#zfd"+discoveryNodeCounter.getAndIncrement()+"#", address, Version.CURRENT);
The part where it uses Version.CURRENT
might not be technically correct in this case, since it's actually connecting to an older version, but it's not obvious (to me, at least) what other way there is if the version of the destination node isn't known at that point (i.e before connecting to it).
I'm also unsure of whether I'm reading this commit message correctly: https://github.com/elasticsearch/elasticsearch/commit/d2fea5378ad8beff6e2eb566c9d573f93771e2ef
.. should I take it to mean that I have to specify the lower bound during the DiscoveryNode
instantiation, or should it really be handled in a backwards-compatible way by Elasticsearch itself? I initially understood it as the latter, but this doesn't seem to be the case.
are you actual having problems with your cluster of is it only that you see this errors showing up? I think this error comes from the backwards layer we have in 1.4 that can't tell what version it's pinging on the first contact so it goes and tries the new (1.4 and above) and the old endpoint (1.3.4 and below). If the new endpoint is causing an error it falls back to the old for backwards compatibility. It might be just this initial probe you are seeing?
The end result here was that the 1.4-instance did not join the cluster, but instead elected itself as master after what looked like three retries:
[2014-10-09 09:43:18,718][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_1_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,722][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_3_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,723][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_2_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,725][TRACE][discovery.zen ] [foo-instance] full ping responses: {none}
[2014-10-09 09:43:18,725][DEBUG][discovery.zen ] [foo-instance] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-10-09 09:43:18,732][INFO ][cluster.service ] [foo-instance] new_master [foo-instance][nacDp91kRDCmsh04NoJBWA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9300]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}, reason: zen-disco-join (elected_as_master)
[2014-10-09 09:43:18,734][INFO ][plugin.found.zookeeper.updaters] [foo-instance] Master node changed. Current masterNode: [[foo-instance][nacDp91kRDCmsh04NoJBWA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9300]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}]
[2014-10-09 09:43:18,740][TRACE][discovery.zen ] [foo-instance] cluster joins counter set to [1] (elected as master)
[2014-10-09 09:43:18,755][INFO ][http ] [foo-instance] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9200]}
@nkvoll this is not intended. The backwards compatibility check that @s1monw mentioned does exist but it doesn't manifest itself in warn level logs.
Can you tell me some more about the cluster layout? I see aws related logs. I also see the zoo keeper plugin?
@bleskes what I'm attempting is a simple cluster upgrade, adding a 1.4.0.Beta1 instance to a cluster consisting of a single instance running 1.3.4. They discover each-other via a custom UnicastHostProvider
, which provides the host/port of the other instances in the cluster.
The 1.3.4 instance only has the following to say (discovery is logged down to the trace level):
[2014-10-10 12:03:14,671][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [0] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:14,703][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [2] and action [internal:discovery/zen/unicast], resetting
[2014-10-10 12:03:16,124][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [4] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:16,132][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [5] and action [internal:discovery/zen/unicast], resetting
[2014-10-10 12:03:17,619][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [7] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:17,626][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [8] and action [internal:discovery/zen/unicast], resetting
While the 1.4.0.Beta1-instance logs the following three times before forming its own cluster (which is permitted since minimum_master_nodes: 1 for both instances -- but shouldn't really be happening)
[2014-10-10 12:03:14,601][TRACE][discovery.zen ] [instance-0000000001] starting to ping
[2014-10-10 12:03:14,604][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]] with temp node [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,606][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,606][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,624][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,625][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]], using >=1.4.0 serialization
[2014-10-10 12:03:14,648][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,649][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,658][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[538289781bfa9cb13184890f9f23584b]}, ping_response{node [[instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[2], master [null], hasJoinedOnce [false], cluster_name[538289781bfa9cb13184890f9f23584b]}]
[2014-10-10 12:03:14,700][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]], falling back to <=1.4.0 ping action
[2014-10-10 12:03:14,700][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,707][WARN ][discovery.zen.ping.unicast] [instance-0000000001] failed to send ping to [[#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]]
org.elasticsearch.transport.RemoteTransportException: [instance-0000000000][inet[/10.74.20.146:19775]][internal:discovery/zen/unicast]
Caused by: org.elasticsearch.transport.ActionNotFoundTransportException: No handler for action [internal:discovery/zen/unicast]
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:210)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
The zookeeper plugin handles the UnicastHostProvider
-part, but does not currently include any versioning information, which is why it uses Version.CURRENT
, which I immediately thought could be the culprit -- but if I understand it correctly, it shouldn't actually matter.
If I simply change the DiscoveryNode
instantiation in our code from using Version.CURRENT
to using Version.V_1_3_0
it works, with the following in the 1.3.4 instance logs:
[2014-10-10 12:23:35,247][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [0] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:36,679][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [4] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:38,186][WARN ][transport.netty ] [instance-0000000000] Message not fully read (request) for [7] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:38,243][INFO ][cluster.service ] [instance-0000000000] added {[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0},}, reason: zen-disco-receive(join from node[[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}])
and the following from the 1.4.0.Beta1-instance:
[2014-10-10 12:23:35,152][TRACE][discovery.zen ] [instance-0000000001] starting to ping
[2014-10-10 12:23:35,155][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,157][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,168][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,199][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,200][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:35,220][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,221][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,232][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[2], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:35,272][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:35,274][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,281][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:36,669][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,670][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,670][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:36,673][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,674][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:36,676][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[3], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[4], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:36,681][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:36,682][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,685][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,174][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,176][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,176][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:38,179][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,180][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:38,183][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[3], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[5], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[6], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,187][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:38,188][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,192][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,193][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,196][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,197][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,199][TRACE][discovery.zen ] [instance-0000000001] full ping responses:
--> ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}
[2014-10-10 12:23:38,200][DEBUG][discovery.zen ] [instance-0000000001] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}
[2014-10-10 12:23:38,201][TRACE][discovery.zen ] [instance-0000000001] ignoring joined once flags in ping responses, minimum ping version [1.3.4]
[2014-10-10 12:23:38,208][TRACE][discovery.zen ] [instance-0000000001] joining master [instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:38,248][DEBUG][discovery.zen.publish ] [instance-0000000001] received cluster state version 10
[2014-10-10 12:23:38,250][DEBUG][discovery.zen.fd ] [instance-0000000001] [master] restarting fault detection against master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], reason [new cluster state received and we are monitoring the wrong master [null]]
[2014-10-10 12:23:38,251][DEBUG][discovery.zen ] [instance-0000000001] got first state from fresh master [o5ulTkS4TV-BDs_NbkReqQ]
[2014-10-10 12:23:38,252][TRACE][discovery.zen ] [instance-0000000001] updated cluster join cluster to [1]
[2014-10-10 12:23:38,254][INFO ][cluster.service ] [instance-0000000001] detected_master [instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}, added {[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0},}, reason: zen-disco-receive(from master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}])
The notable difference being that there were no exceptions logged when instantiating the DiscoveryNode
using a lower-bound Version
, and that the 1.4 instance was able to join the 1.3 cluster.
Does this possibly mean that there might be an issue with providing DiscoveryNode
s using a UnicastHostsProvider
when the target instance version is actually unknown?
oh I see - sorry for the confusion about the BackwardsCompatibility endpoint. @nkvoll for the case when you don't know the discovery nodes version you can just use Version. CURRENT.minimumCompatibilityVersion()
which returns the minimal compatible version. This should work just fine in that case.
Ok @s1monw, I'll verify that it works for us either this weekend, or at latest, on Monday. Since we have a workaround already, I'm fine with closing this issue now if you wish to do so, and I leave it entirely up to you whether this behaviour needs to be noted somewhere else as a help for other developers that have latched onto the UnicastHostsProvider
-interface and are instantiating DiscoveryNode
s the same way we did.
Thanks for the feedback and for letting me know about the minimumCompatibilityVersion
method :)
I will add some javadocs to the ctors to clarify! Thanks for raising this issue
I ran into this issue where a freshly 1.4.0.Beta.1 instance would not be able to ping (or receive pings?) from a 1.3.4-instance, and found the following in the 1.4 instance log:
Is there something obvious I have missed/done wrong? The action name seems pretty odd.