OpenClovis / SAFplus-Availability-Scalability-Platform

Middleware that provides libraries, GUI, and code generator to design multi-node (clustered) applications that are highly available, redundant, and scalable. Provides sub-second node and application fault detection and failover, and useful application libraries including distributed hash tables (checkpoint), event, logging, and communications. Implements SA-Forum APIs where applicable. Used anywhere reliability is a must -- like telecom, wireless, defense and enterprise computing. Download stable release with installer from: ftp.openclovis.com
www.openclovis.com
GNU General Public License v2.0
19 stars 13 forks source link

Create SAFplus_AMF as child of Watchdog process. #124

Closed nateshrevankar closed 10 years ago

nateshrevankar commented 10 years ago

AMF to be made as child process of Watchdog:

User process will start Watchdog process which will be running in loop. This watchdog (loop) process will start another python process for starting ASP. This again start routine for starting CPM main. That process spawns child as AMF and start all other related processes.

To make the parent process of AMF as watchdog, ASP to be started in same (watchdog) process and same process need to start CPM main without forking. By this flow, we can make SAFplus_AMF as child process of Watchdog and can handle the termination of SAFplus_amf to handle in watchdog process.

Estimation: This change can take 3 days of implementation and testing.

Current Implementation: start_watchdog() PID[27814] watchdog_loop() PID[27822] start_asp() PID[27833] AMF PID[27888]

root 27822 1 0 21:20 ? 00:00:00 python /root/saf/etc/safplus_watchdog.py root 27888 1 0 21:20 ? 00:00:00 /root/saf/bin/safplus_amf -c 0 -l 1 -n SCNodeI0 root 27899 27888 17 21:20 ? 00:00:22 /root/saf/bin/safplus_logd root 27912 27888 0 21:20 ? 00:00:00 /root/saf/bin/safplus_gms clGmsConfig.xml root 27931 27888 0 21:20 ? 00:00:00 /root/saf/bin/safplus_event root 27940 27888 0 21:20 ? 00:00:00 /root/saf/bin/safplus_name root 27941 27888 0 21:20 ? 00:00:00 /root/saf/bin/safplus_ckpt root 27968 27888 0 21:20 ? 00:00:00 /root/saf/bin/safplus_msg root 27997 27888 37 21:20 ? 00:00:43 /root/saf/bin/csa102 -p root 27998 27888 37 21:20 ? 00:00:43 /root/saf/bin/csa102 –p

nateshrevankar commented 10 years ago

INITIAL : root@ubuntu:~# ps -eaf | grep -i saf root 13546 1 0 14:37 ? 00:00:00 python /root/saf_4_feb/etc/safplus_watchdog.py root 13556 1 0 14:37 ? 00:00:00 python /root/saf_4_feb/etc/asp.py root 13590 13556 0 14:37 ? 00:00:00 sh -c ulimit -c unlimited; /root/saf_4_feb/bin/safplus_amf -c 0 -l 1 -n SCNodeI0 root 13591 13590 0 14:37 ? 00:00:00 /root/saf_4_feb/bin/safplus_amf -c 0 -l 1 -n SCNodeI0 root 13603 13591 10 14:37 ? 00:00:07 /root/saf_4_feb/bin/safplus_logd root 13617 13591 0 14:37 ? 00:00:00 /root/saf_4_feb/bin/safplus_gms clGmsConfig.xml root 13634 13591 0 14:37 ? 00:00:00 /root/saf_4_feb/bin/safplus_event root 13643 13591 0 14:37 ? 00:00:00 /root/saf_4_feb/bin/safplus_name root 13644 13591 0 14:37 ? 00:00:00 /root/saf_4_feb/bin/safplus_ckpt root 13671 13591 0 14:37 ? 00:00:00 /root/saf_4_feb/bin/safplus_msg root 13701 13591 41 14:37 ? 00:00:25 /root/saf_4_feb/bin/csa102 -p root 13702 13591 41 14:37 ? 00:00:25 /root/saf_4_feb/bin/csa102 -p root 13740 2755 0 14:38 pts/1 00:00:00 grep --color=auto -i saf root@ubuntu:~#


The watchdog is spawning other processes and able to reduce one stage of process switching.

root@ubuntu:~# ps -eaf | grep -i saf root 14037 1 0 14:47 ? 00:00:00 python /root/saf_4_feb/etc/safplus_watchdog.py root 14047 14037 0 14:47 ? 00:00:00 /bin/sh -c ulimit -c unlimited; /root/saf_4_feb/bin/safplus_amf -c 0 -l 1 -n SCNodeI0 root 14048 14047 0 14:47 ? 00:00:00 /root/saf_4_feb/bin/safplus_amf -c 0 -l 1 -n SCNodeI0 root 14061 14048 9 14:47 ? 00:00:04 /root/saf_4_feb/bin/safplus_logd root 14075 14048 0 14:47 ? 00:00:00 /root/saf_4_feb/bin/safplus_gms clGmsConfig.xml root 14092 14048 0 14:47 ? 00:00:00 /root/saf_4_feb/bin/safplus_event root 14101 14048 0 14:47 ? 00:00:00 /root/saf_4_feb/bin/safplus_name root 14102 14048 0 14:47 ? 00:00:00 /root/saf_4_feb/bin/safplus_ckpt root 14129 14048 0 14:47 ? 00:00:00 /root/saf_4_feb/bin/safplus_msg root 14159 14048 38 14:47 ? 00:00:11 /root/saf_4_feb/bin/csa102 -p root 14160 14048 38 14:47 ? 00:00:11 /root/saf_4_feb/bin/csa102 -p root 14192 2755 0 14:48 pts/1 00:00:00 grep --color=auto -i saf

TO DO: The process /bin/sh (pid = 14047 ) needs to be eliminated to make AMF as child of Watchdog.

nateshrevankar commented 10 years ago

Able to create AMF as child of Watchdog process.

From the below ps info, safplus_watchdog.py (PID: 6652) is starting safplus_amf (PID: 6714).

safplus_amf inturn start other safplus services safplus_logd (PID: 6727), safplus_gms clGmsConfig.xml (PID: 6740), safplus_event (PID: 6758), safplus_name (PID: 6767), safplus_ckpt (PID: 6768), safplus_msg (PID: 6795).

Pls find the ps grep for more info: root@ubuntu:~# ps -eaf | grep -i saf root 6652 1 0 21:13 ? 00:00:00 python /root/saf_5_feb/etc/safplus_watchdog.py root 6714 6652 0 21:13 ? 00:00:00 /root/saf_5_feb/bin/safplus_amf -c 0 -l 1 -n SCNodeI0 root 6727 6714 7 21:13 ? 00:00:06 /root/saf_5_feb/bin/safplus_logd root 6740 6714 0 21:13 ? 00:00:00 /root/saf_5_feb/bin/safplus_gms clGmsConfig.xml root 6758 6714 0 21:13 ? 00:00:00 /root/saf_5_feb/bin/safplus_event root 6767 6714 0 21:13 ? 00:00:00 /root/saf_5_feb/bin/safplus_name root 6768 6714 0 21:13 ? 00:00:00 /root/saf_5_feb/bin/safplus_ckpt root 6795 6714 0 21:13 ? 00:00:00 /root/saf_5_feb/bin/safplus_msg root 6822 6714 42 21:14 ? 00:00:29 /root/saf_5_feb/bin/csa102 -p root 6823 6714 42 21:14 ? 00:00:29 /root/saf_5_feb/bin/csa102 -p

The WIP code is uploaded at: https://github.com/nateshrevankar/SAFplus-Availability-Scalability-Platform/commit/fea8466b4e50ca94aa6d7d87fb654babb09eb9b8