fredbcode / Vrrpd

Project moved to https://gitlab.com/fredbcode/Vrrpd Advanced Vrrpd That version has many improvements like monitoring other vrrpd processes and executing a command when changing back and forth from master to backup. You can also use atropos program for view or change global state. VRRP For Linux
https://gitlab.com/fredbcode
GNU General Public License v2.0
75 stars 31 forks source link

Change gettimeofday() function to clock_gettime() using CLOCK_MONOTONIC.... #5

Closed csavoie closed 10 years ago

csavoie commented 10 years ago

... This clock is not affect by manual time or ntp changes.

fredbcode commented 10 years ago

Ok tests needed before new release

csavoie commented 10 years ago

How to reproduce time adjustment bug

Setup:

2 linux boxes with an instance of vrrpd on each box.


| box 1(M)| | box 2(B)|


|               |
-----------------

box 1: eth0: 10.1.1.2/24 box 2: eth0: 10.1.1.3/24

start vrrp on both boxes vrrpd -i eth0 -v 1 10.1.1.1

Test procedure: 1) check vrrpd state on both boxes. 1) set time on the box that is MASTER back 30 seconds. date -s "-30 seconds" 2) check vrrpd state on both boxes. Both vrrpd instance will be MASTER for aproximately 30 seconds.

Example:

Before

box 1

provo-olive:~ # ifconfig private private Link encap:Ethernet HWaddr 00:1E:67:57:CB:C9 inet addr:192.168.219.1 Bcast:192.168.219.255 Mask:255.255.255.0 inet6 addr: fe80::21e:67ff:fe57:cbc9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:75476 errors:0 dropped:0 overruns:0 frame:0 TX packets:191711 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9443989 (9.0 Mb) TX bytes:17166168 (16.3 Mb) Memory:d0f20000-d0f3ffff provo-olive:~ # atropos --state

VRRP PID 28922 STATE: STATE BACKUP

UID PID PPID C STIME TTY TIME CMD root 28922 1 0 13:56 ? 00:00:00 /usr/sbin/vrrpd -n -i private -v 1 192.168.219.254 24

Be careful, Atropos doesn't show virtual mac address of vlan interface Take a look at syslog for more informations

box 2 sandy-olive:~ # ifconfig private private Link encap:Ethernet HWaddr 00:1E:67:57:CA:61 inet addr:192.168.219.2 Bcast:192.168.219.255 Mask:255.255.255.0 inet6 addr: fe80::21e:67ff:fe57:ca61/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:794230 errors:0 dropped:0 overruns:0 frame:0 TX packets:327335 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:70947114 (67.6 Mb) TX bytes:58111381 (55.4 Mb) Memory:d0f20000-d0f3ffff

sandy-olive:~ # atropos --state

VRRPD STATE 105853: STATE MASTER

UID PID PPID C STIME TTY TIME CMD root 105853 1 0 13:55 ? 00:00:00 /usr/sbin/vrrpd -n -i private -v 1 192.168.219.254 24

Be careful, Atropos doesn't show virtual mac address of vlan interface Take a look at syslog for more informations

After

box 1

provo-olive:~ # atropos --state

VRRPD STATE 28922: STATE MASTER

UID PID PPID C STIME TTY TIME CMD root 28922 1 0 13:56 ? 00:00:00 /usr/sbin/vrrpd -n -i private -v 1 192.168.219.254 24

Be careful, Atropos doesn't show virtual mac address of vlan interface Take a look at syslog for more informations

box 2

sandy-olive:~ # atropos --state

VRRPD STATE 105853: STATE MASTER

UID PID PPID C STIME TTY TIME CMD root 105853 1 0 13:49 ? 00:00:00 /usr/sbin/vrrpd -n -i private -v 1 192.168.219.254 24

Be careful, Atropos doesn't show virtual mac address of vlan interface Take a look at syslog for more informations

fredbcode commented 10 years ago

Ok seem great, nothing strange in syslog after atropos (date, hour, etc) ?

csavoie commented 10 years ago

There maybe a corner case which could still cause this problem without adjusting time. The rollover of the VRRP_TIMER_CLK() could have the same effect and that could take a long time to fix itself.

csavoie commented 10 years ago

nothing in the logs for the master. The backup reports a state change because the master stopped transmitting advertisements.

fredbcode commented 10 years ago

Sorry, I mean after atropos --state, this command show many informations in syslog

csavoie commented 10 years ago

After checking the VRRP_TIMER_EXPIRED() function, i think the rollover will not cause this problem.

csavoie commented 10 years ago

Can you hold off 1.11? I have found the the rollover of the VRRP_TIMER_CLK() will cause the protocol to break. The adver_timer will not reset after rollover so the MASTER will not send vrrp advertisements. Eventually, all vrrp instances will become master and none of them will send advertisement. This is a big bug!!!!

fredbcode commented 10 years ago

Yes no problem

fredbcode commented 10 years ago

Ok done

csavoie commented 10 years ago

The test that I conducted had a flaw so the preceived time rollover bug did not occur when I fixed my test. Using the CLOCK_MONOTONIC time does still adjust time at the micro seconds scale but I do not believe that it will not cause the current time to go back far enough to prevent the timer from triggering properly.

You can continue stamping 1.11.

Thank you for waiting, Charles Savoie

fredbcode commented 10 years ago

Please, can you be more explicit about your test and your problem ? I'm travelling now, I can't play with github for the moment