Closed costin closed 10 years ago
Yay! We're in the process of upgrading to 7u72 right now. Happy times.
If it needs an assignee I can look at it on Monday. I need an excuse to go poking around in this code. OTOH if someone has time to work on it please don't let me stop you. Your Monday probably starts before mine anyway.
@nik9000 Note that I've currently tested it on Windows - there's a slim chance, the bug doesn't occur on *nix machines...
Ah. Well, I'll at least be able to test that on Monday. I don't think my wife would appreciate me commandeering her machine to reproduce on Windows. So if it doesn't reproduce on unix I'd probably be a bad person to work on it.
I'm on Windows 8.1 but I can't reproduce:
c:\>"%JAVA_HOME%\bin\java.exe" -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) Client VM (build 24.72-b04, mixed mode, sharing)
c:\Data\elasticsearch-1.4.0.Beta1>bin\elasticsearch
[2014-10-27 11:25:44,236][WARN ][bootstrap ] jvm uses the client vm, make sure to run `java` with the server vm for best performance by adding `-server` to the command line
[2014-10-27 11:25:44,299][INFO ][node ] [Pete Wisdom] version[1.4.0.Beta1], pid[4656], build[1f25669/2014-10-01T14:58:15Z]
[2014-10-27 11:25:44,299][INFO ][node ] [Pete Wisdom] initializing ...
[2014-10-27 11:25:44,314][INFO ][plugins ] [Pete Wisdom] loaded [], sites []
[2014-10-27 11:25:46,317][INFO ][node ] [Pete Wisdom] initialized
[2014-10-27 11:25:46,317][INFO ][node ] [Pete Wisdom] starting ...
[2014-10-27 11:25:46,458][INFO ][transport ] [Pete Wisdom] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.194.169:9300]}
[2014-10-27 11:25:46,520][INFO ][discovery ] [Pete Wisdom] elasticsearch/oVt8wl4-S9usPM0xWva_TA
[2014-10-27 11:25:49,567][INFO ][cluster.service ] [Pete Wisdom] new_master [Pete Wisdom][oVt8wl4-S9usPM0xWva_TA][WIN-QIFLG7L0DOO][inet[/192.168.194.169:9300]], reason: zen-disco-join (elected_as_master)
[2014-10-27 11:25:49,598][INFO ][gateway ] [Pete Wisdom] recovered [0] indices into cluster_state
[2014-10-27 11:25:49,614][INFO ][http ] [Pete Wisdom] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.194.169:9200]}
[2014-10-27 11:25:49,614][INFO ][node ] [Pete Wisdom] started
Am I doing something wrong here @costin ?
Looks like the issue can be reproduced only on Win 8.1 SP1 with both u71 and u72. Elasticsearch ran fine on Win 7 and Win 8.1 (no SP1). In my opinion, this is not a blocker any more however there is work on the way to provide a fix for it.
@costin I'm also not able to reproduce this running Win 8.1.
What is considered SP1? I can't seem to find any updates being referred to as SP1. The latest update I have installed is KB2995388.
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
[2014-10-27 10:42:15,525][INFO ][node ] [Persuasion] version[1.4.0.Beta1], pid[4416], build[1f25669/2014-10-01T14:58:15Z]
[2014-10-27 10:42:15,525][INFO ][node ] [Persuasion] initializing ...
[2014-10-27 10:42:15,541][INFO ][plugins ] [Persuasion] loaded [], sites []
[2014-10-27 10:42:18,119][INFO ][node ] [Persuasion] initialized
[2014-10-27 10:42:18,119][INFO ][node ] [Persuasion] starting ...
[2014-10-27 10:42:18,259][INFO ][transport ] [Persuasion] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.211.55.3:9300]}
[2014-10-27 10:42:18,291][INFO ][discovery ] [Persuasion] elasticsearch/Mwp7dFlMQW6kVg-xsdU87w
[2014-10-27 10:42:21,338][INFO ][cluster.service ] [Persuasion] new_master [Persuasion][Mwp7dFlMQW6kVg-xsdU87w][GREGMARZOUK4138][inet[/10.211.55.3:9300]], reason: zen-disco-join (elected_as_master)
[2014-10-27 10:42:21,370][INFO ][http ] [Persuasion] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.211.55.3:9200]}
[2014-10-27 10:42:21,370][INFO ][node ] [Persuasion] started
Based on various feedback it seems that I am the only one running into this issue which suggests there's some configuration issue with my system. The win 7, win 8.1 (with and without Update 1/ SP1) that we have tested, do not seem to be affected.
@clintongormley Hence why I removed the two labels above (critical and blocker).
@nik9000 see the comment above - I don't think you're going to be affected and anyway, there's a fix already in the works (see the PR).
Jenkins seeing similar, if the the same failure on window 2012 r2 server
1> [2014-10-27 17:59:39,689][DEBUG][discovery.zen.ping.multicast] [node_t0] failed to send multicast ping request
1> java.lang.NullPointerException
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.sendPingRequest(MulticastZenPing.java:256)
1> at org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing.ping(MulticastZenPing.java:188)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:146)
1> at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:124)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:932)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:341)
1> at org.elasticsearch.discovery.zen.ZenDiscovery.access$5600(ZenDiscovery.java:83)
1> at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1299)
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
1> at java.lang.Thread.run(Thread.java:745)
1> [2014-10-27 17:59:39,705][DEBUG][discovery.zen.ping.multicast] [node_t1] multicast failed to start [SocketException[Unrecognized Windows Sockets error: 0: Cannot bind]], disabling
1> java.net.SocketException: Unrecognized Windows Sockets error: 0: Cannot bind
1> at java.net.TwoStacksPlainDatagramSocketImpl.bind0(Native Method)
1> at java.net.TwoStacksPlainDatagramSocketImpl.bind0(TwoStacksPlainDatagramSocketImpl.java:107)
1> at java.net.AbstractPlainDatagramSocketImpl.bind(AbstractPlainDatagramSocketImpl.java:96)
1> at java.net.TwoStacksPlainDatagramSocketImpl.bind(TwoStacksPlainDatagramSocketImpl.java:97)
1> at java.net.DatagramSocket.bind(DatagramSocket.java:396)
1> at java.net.MulticastSocket.
Notice the NullPointerException
have been fixed in the PR. However the core problem (multicast that suddenly fails) is still present. On my machine the error occurs when calling java.net.MulticastSocket#setInterface
while here it seems to occur when calling MulticastSocket(SocketAddress)
Pushed the fix in 1.3, 1.4, 1.x and master. In case of a socket error (for whatever reason) a nicer message is displayed to the user and the zen code is skipped (otherwise it causes its own exceptions which only add to the confusion).
This plus the ES_USE_IPV4
flag seem to counter the error reproducible so far only on my windows machines...
I've just updated to JDK 7 update 72 and 1.4.0.Beta1 doesn't work any more (win 8.1).
1.4.0.Beta1 works fine with 7u67 and 8u25. It looks like only 7u72 is problematic (potentially 7u71). Tested 1.3.4 on 7u72 and it seems to be
working fine. Verified the java version and the same problem occurs - clearly something has changed in the JDK.