OpenClovis / SAFplus-Availability-Scalability-Platform

Middleware that provides libraries, GUI, and code generator to design multi-node (clustered) applications that are highly available, redundant, and scalable. Provides sub-second node and application fault detection and failover, and useful application libraries including distributed hash tables (checkpoint), event, logging, and communications. Implements SA-Forum APIs where applicable. Used anywhere reliability is a must -- like telecom, wireless, defense and enterprise computing. Download stable release with installer from: ftp.openclovis.com
www.openclovis.com
GNU General Public License v2.0
19 stars 13 forks source link

UDP multicast discovery mode: node fails to initialize with IOC error #80

Closed AndrewStoneOpenClovis closed 11 years ago

AndrewStoneOpenClovis commented 11 years ago

when starting SAFplus you will sometimes get the following error:

[clUdpNotification.c:393](sysctrlI1.13631 : AMF.UDP.NOTIF.00025 : ERROR) setsockopt IP_ADD_MEMBERSHIP failed with error [No such device]

This either means that multicast is not enabled or a route is not set up.

Full log:

Fri Sep 9 23:16:13.785 2011 [../common/clPluginHelper.c:131](sysctrlI1.13631 : AMF.IOC.PLUGIN_HELPER.00021 : INFO) ASP_UDP_LINK_NAME env is exported. Value is eth0:10 Fri Sep 9 23:16:13.785 2011 [clUdp.c:357](sysctrlI1.13631 : AMF.UDP.INI.00022 : DEBUG) Link Name: eth0:10, IP Node Address: 192.168.57.2, Network Address: 255.255.255.0, Broadcast: 192.168.57.255 Fri Sep 9 23:16:13.786 2011 [../common/clPluginHelper.c:510](sysctrlI1.13631 : AMF.IOC.PLUGIN_HELPER.00024 : INFO) Ignored assignment IP address: 192.168.57.2, for device: eth0:10 Fri Sep 9 23:16:13.786 2011 [clUdpNotification.c:393](sysctrlI1.13631 : AMF.UDP.NOTIF.00025 : ERROR) setsockopt IP_ADD_MEMBERSHIP failed with error [No such device] Fri Sep 9 23:16:13.786 2011 [clTransport.c:213](sysctrlI1.13631 : AMF.XPORT.FIN.00026 : NOTICE) Inside fake transport finalize Fri Sep 9 23:16:13.789 2011 [clEo.c:332](sysctrlI1.13631 : AMF. EO.INI.00027 : CRITIC) Failed to initialize essential library [IOC], error [0x50024] Fri Sep 9 23:16:13.789 2011 [clEo.c:597](sysctrlI1.13631 : AMF. EO.INI.00028 : CRITIC) Failed to initialize all essential libraries, error [0x50024] Fri Sep 9 23:16:13.789 2011 [clEo.c:826](sysctrlI1.13631 : AMF. EO.INI.00029 : CRITIC) Exiting : EO setup failed, error [0x50024]

Therefore the system needs configuration similar to the following to work properly: route add -net 224.0.0.0 netmask 224.0.0.0 eth0

Here's a wild guess: The issue may also be that the code binds the multicast to INADDR_ANY rather then a specific interface. So it needs that route entry to figure out which interface to actually bind. Please research this since it would be "cleaner" to specify the interface explicitly rather then add this route.

see: http://www-mice.cs.ucl.ac.uk/multimedia/software/documentation/ipv6.html#mroute

AndrewStoneOpenClovis commented 11 years ago

Note, switching INADDR_ANY to gVirtualIp.ip causes the system to not return an error during the API call. But the route is still needed as shown below:

Sat Sep 10 00:10:07.541 2011 [../common/clPluginHelper.c:131](sysctrlI1.16111 : AMF.IOC.PLUGIN_HELPER.00021 : INFO) ASP_UDP_LINK_NAME env is exported. Value is eth0:10 Sat Sep 10 00:10:07.541 2011 [clUdp.c:357](sysctrlI1.16111 : AMF.UDP.INI.00022 : DEBUG) Link Name: eth0:10, IP Node Address: 192.168.57.2, Network Address: 255.255.255.0, Broadcast: 192.168.57.255 Sat Sep 10 00:10:07.541 2011 [../common/clPluginHelper.c:510](sysctrlI1.16111 : AMF.IOC.PLUGIN_HELPER.00024 : INFO) Ignored assignment IP address: 192.168.57.2, for device: eth0:10 Sat Sep 10 00:10:07.542 2011 [clUdpNotification.c:540](sysctrlI1.16111 : AMF.UDP.NOTIF.00025 : ERROR) sendmsg failed with error [Network is unreachable] for destination [225.0.1.1] Sat Sep 10 00:10:07.542 2011 [clUdpNotification.c:540](sysctrlI1.16111 : AMF.UDP.NOTIF.00026 : ERROR) sendmsg failed with error [Network is unreachable] for destination [225.0.1.1] Sat Sep 10 00:10:07.542 2011 [clIocHeartBeat.c:910](sysctrlI1.16111 : AMF.IOC.HBT.00027 : NOTICE) Heartbeat set to fast failure detection for local components Sat Sep 10 00:10:07.543 2011 [clEo.c:324](sysctrlI1.16111 : AMF. EO.INI.00028 : DEBUG) Initializing essential library [RMD]... Sat Sep 10 00:10:07.543 2011 [clEo.c:324](sysctrlI1.16111 : AMF. EO.INI.00029 : DEBUG) Initializing essential library [EO]... Sat Sep 10 00:10:07.543 2011 [clEo.c:603](sysctrlI1.16111 : AMF. EO.INI.00030 : INFO) Initializing basic libraries... Sat Sep 10 00:10:07.543 2011 [eo.c:1160](sysctrlI1.16111 : AMF._EO.INI.00031 : DEBUG) Creating EO for [AMF]. Msg threads [1] comm port [1] main thread? [0] Sat Sep 10 00:10:07.543 2011 [clUdpNotification.c:540](sysctrlI1.16111 : AMF.UDP.NOTIF.00032 : ERROR) sendmsg failed with error [Network is unreachable] for destination [225.0.1.1] Sat Sep 10 00:10:07.543 2011 [clTransportNotify.c:251](sysctrlI1.16111 : AMF.XPORT.NOTIFY.00033 : DEBUG) Xport notify opened for path [/root/safplus/var/run/notify/cpmServer_sysctrlI1_1] Sat Sep 10 00:10:07.544 2011 [eo.c:1296](sysctrlI1.16111 : AMF._EO.INI.00034 : ERROR) clIocCommPortCreate() failed, error [0xb] Sat Sep 10 00:10:07.544 2011 [clEo.c:826](sysctrlI1.16111 : AMF. EO.INI.00035 : CRITIC) Exiting : EO setup failed, error [0xb]

AndrewStoneOpenClovis commented 11 years ago

To trigger the bug, I'll bet you can do: route del -net 224.0.0.0 netmask 224.0.0.0 eth0

before starting SAFplus

hoangle commented 11 years ago

Yes, if using multicast, having enable this for interface.