OpenClovis / SAFplus-Availability-Scalability-Platform

Middleware that provides libraries, GUI, and code generator to design multi-node (clustered) applications that are highly available, redundant, and scalable. Provides sub-second node and application fault detection and failover, and useful application libraries including distributed hash tables (checkpoint), event, logging, and communications. Implements SA-Forum APIs where applicable. Used anywhere reliability is a must -- like telecom, wireless, defense and enterprise computing. Download stable release with installer from: ftp.openclovis.com
www.openclovis.com
GNU General Public License v2.0
19 stars 13 forks source link

clNodeCacheLeaderUpdate should sending the status to all nodes (broadcast) #96

Closed hoangle closed 11 years ago

hoangle commented 11 years ago

From: Andrew Stone stone@openclovis.com Date: Fri, Jul 12, 2013 at 8:57 AM Subject: affirmed update To: hoang le hoang.le@openclovis.com, eng-all eng-all@openclovis.com

Hi Hoang,

Grab my latest checkin on the 6.0 branch and work with that. Also, check your systems for keepalive failures by tailing /var/log/kern.log. I was getting lots of errors with VMs so I increased my TIPC timeouts as shown in the attached dpy.py script.

I also enhanced that to issue the commands simultaneously using threads.

I have your auto-create Node fix commented out for now, because it is hiding the issue of the AMF checkpoint losing data during a failover.

Finally, to close the book on the multi-master issue, I would like the newly elected master to update the node cache on all nodes with its new status.

To do this, I'd like the function: ClRcT clNodeCacheLeaderUpdate(ClIocNodeAddressT currentLeader)

to call something in clIocNotification to send a "gratituous" notification. Can you implement this; I think you have a lot more experience in this area than I do.

Thanks, Andy