Ranger802004 / asusmerlin

ASUS Merlin
GNU General Public License v3.0
41 stars 4 forks source link

Failback doesnt seem to be happening #3

Closed p1r473 closed 1 year ago

p1r473 commented 1 year ago

Hi Ranger802004. Found this script today and absolutely love it However, I cant seem to get failback to work. The failover worked great, but failback seems to not be working. Am using 1.6.0. (I tried the 2.0.0 beta and failover didnt seem to work)

This is what I see about 2 minutes still after the failover: Primary is still in hot standby. image

Ranger802004 commented 1 year ago

Hi Ranger802004. Found this script today and absolutely love it However, I cant seem to get failback to work. The failover worked great, but failback seems to not be working. Am using 1.6.0. (I tried the 2.0.0 beta and failover didnt seem to work)

This is what I see about 2 minutes still after the failover: Primary is still in hot standby. image

I need debug logs to be able to diagnose the issue here. Turn on debug logging in the router GUI and then run capture mode and simulate the issue please. I would recommend to test on v2.0.0-beta5 since this is the latest code.

p1r473 commented 1 year ago

Hi Ranger802004. Found this script today and absolutely love it However, I cant seem to get failback to work. The failover worked great, but failback seems to not be working. Am using 1.6.0. (I tried the 2.0.0 beta and failover didnt seem to work) This is what I see about 2 minutes still after the failover: Primary is still in hot standby. image

I need debug logs to be able to diagnose the issue here. Turn on debug logging in the router GUI and then run capture mode and simulate the issue please.

Is that different than regular logging? I wasnt able to find an option that said debug logging. (screenshot at bottom)

Im just looking at the capture now and it looks like the script is disabled due to Asus Failover being enabled. However, on my router, I can't seem to find a way to enable dual WAN without enabling failover, and I dont want to do load balance mode as my secondary WAN is a 4g LTE sim card and data is $$$

wan-failover - Capture Mode
Mar  5 17:28:20 PortRoyal wan-failover.sh: WAN Failover Disabled - ASUS Factory WAN Failover is enabled

image

image

Ranger802004 commented 1 year ago

Hi Ranger802004. Found this script today and absolutely love it However, I cant seem to get failback to work. The failover worked great, but failback seems to not be working. Am using 1.6.0. (I tried the 2.0.0 beta and failover didnt seem to work) This is what I see about 2 minutes still after the failover: Primary is still in hot standby. image

I need debug logs to be able to diagnose the issue here. Turn on debug logging in the router GUI and then run capture mode and simulate the issue please.

Is that different than regular logging? I wasnt able to find an option that said debug logging. (screenshot at bottom)

Im just looking at the capture now and it looks like the script is disabled due to Asus Failover being enabled. However, on my router, I can't seem to find a way to enable dual WAN without enabling failover, and I dont want to do load balance mode as my secondary WAN is a 4g LTE sim card and data is $$$

image

image

You need to turn off the network monitoring options to fully disable it, this is noted in the readme. Capture mode will create a temporary file under /tmp/ with only WAN Failover logs and you are using Scribe so I think it will already log debug logs by default.

Edit: If those are still enabled my script wouldn't have done anything, the firmware probably did your failover but because you have allow failback turned off it wasn't going back. My script won't operate unless all of those options are turned off to prevent conflicts with factory failover.

p1r473 commented 1 year ago

Whoops sorry I missed that line in the readme - I did see the one to disable failback. Network monitoring is now off and failback seemed to work. Took about 1.5 min or so to fail back. It does seem to be going nuts on the emails though and has generated 7 and still going. Ideally one email would be sent for failover and one for failback My testing was:

  1. Reboot router
  2. Start capture
  3. Wait for SKIPEMAILSYSTEMUPTIME seconds
  4. Unplug the primary WAN
  5. Observe failover. Waited about 5 min
  6. Replug in WAN

logs: wan-failover-2023-03-05-17%3A37%3A06-EST.log

image image

p1r473 commented 1 year ago

I see in the logs ***Warning*** Compatibility issues with 8.8.8.8 may occur without specifying Outbound Interface But I do see youre using 8.8.8.8 in the examples in the readme. I am assumng safe to ignore this warning?

Also, the emails are still coming, so something is for sure wrong. Im going through the logs now too could be something to do with WAN Failover Disabled - WAN Failover is currently disabled. ***Review Logs***

It looks like wan0 is continously failing maybe?

Heres the latest email:

***WAN Failover Notification***
----------------------------------------------------------------------------------------
Hostname: PortRoyal
Event Time: Mar 5 17:52:04
WAN0 Status: CONNECTED
WAN1 Status: CONNECTED
Active ISP: CIK Telecom
Primary WAN: wan0
WAN IPv4 Address: 10.1.70.24
WAN Gateway IP Address: 10.1.71.254
WAN Interface: eth5
DNS Server 1: 192.168.1.186
DNS Server 2: 192.168.1.168
QoS Status: Enabled
QoS Mode: Automatic Settings
----------------------------------------------------------------------------------------
Ranger802004 commented 1 year ago

I see in the logs

***Warning*** Compatibility issues with 8.8.8.8 may occur without specifying Outbound Interface

But I do see youre using 8.8.8.8 in the examples in the readme. I am assumng safe to ignore this warning?

Also, the emails are still coming, so something is for sure wrong. Im going through the logs now too

could be something to do with WAN Failover Disabled - WAN Failover is currently disabled. ***Review Logs***

It looks like wan0 is continously failing maybe?

Heres the latest email:


***WAN Failover Notification***

----------------------------------------------------------------------------------------

Hostname: PortRoyal

Event Time: Mar 5 17:52:04

WAN0 Status: CONNECTED

WAN1 Status: CONNECTED

Active ISP: CIK Telecom

Primary WAN: wan0

WAN IPv4 Address: 10.1.70.24

WAN Gateway IP Address: 10.1.71.254

WAN Interface: eth5

DNS Server 1: 192.168.1.186

DNS Server 2: 192.168.1.168

QoS Status: Enabled

QoS Mode: Automatic Settings

----------------------------------------------------------------------------------------

This is because it was unable to ping it during the initialization with specifying the outbound interface in the IP rule and this is an issue on some routers so it's a compensating feature. Are the emails saying the states of each interface are changing? This could be the router having issues.

p1r473 commented 1 year ago

The emails are still coming, even after I disabled email sending in the config app, and rebooted twice Did anything in the logs tell you why the emails wont stop coming? Could they have been queued from during an outage and are just being released now? image image

Ranger802004 commented 1 year ago

The emails are still coming, even after I disabled email sending in the config app, and rebooted twice

Did anything in the logs tell you why the emails wont stop coming?

Could they have been queued from during an outage and are just being released now?

image

image

That may be delayed deliveries occurring from a prior test, what are the timestamps in the alerts?

p1r473 commented 1 year ago

I did a total of 2 failover tests which has resulted in 13 emails so far. I think they might have stopped as last one was a few minutes ago

Is there any way to get it to send only one email from failover and one email from failback?

image

This week Ill take a shot at integrating a different notification system, maybe PushOver, PushBullet or a blink(1) light

Ranger802004 commented 1 year ago

I did a total of 2 failover tests which has resulted in 13 emails so far. I think they might have stopped as last one was a few minutes ago

Is there any way to get it to send only one email from failover and one email from failback?

image

It's built into the logic of the latest beta to not send duplicates if the WAN States are exactly the same as the last notification so look at the emails and see if the statuses are changing back and forth. This could be an issue with the router, maybe it's flopping back and forth on failover because of not being able to ping a target? Try changing the ping target if so or increase Recursive Ping Checks. Also review the timestamps inside the emails not the delivery times so see when they are actually occurring.

p1r473 commented 1 year ago

Sure, will set it to 1.1.1.1 and change recursivepingchecks to 5. It also could definitely be the router as the wifi is continuously dropping lately. Will be buying a new one when the GT-BE98 releases if Merlin supports it. Will post back later today or tomorrow. Thanks for the great work!

Will help you test the beta. Is it ready for testing yet? I tried it earlier but didnt work, but could have been from network monitoring being enabled.

Ranger802004 commented 1 year ago

Sure, will set it to 1.1.1.1 and change recursivepingchecks to 5.

It also could definitely be the router as the wifi is continuously dropping lately. Will be buying a new one when the GT-BE98 releases if Merlin supports it.

Will post back later today or tomorrow.

Thanks for the great work!

Will help you test the beta. Is it ready for testing yet? I tried it earlier but didnt work, but could have been from network monitoring being enabled.

If it is on the repo beta channel it is ready for testing and feedback. Please absolutely test, I'd like to workout any issues before production release. 5 will be a little much I think for recursive checks, I would start with 2-3.

p1r473 commented 1 year ago

Ok, I'll switch over to the beta for testing.

I have had emails configured in 1.6.0 to be disabled for over 30 min and they are still coming, and the time stamps are within last 3 minutes. I've also rebooted 3 times

Event Time: Mar 5 18:30:21 I'm just stepping out now but will test the beta tomorrow or later tonight

Ranger802004 commented 1 year ago

Ok, I'll switch over to the beta for testing.

I have had emails configured in 1.6.0 to be disabled for over 30 min and they are still coming, and the time stamps are within last 3 minutes.

I'm just stepping out now but will test the beta tomorrow or later

Yes please, there are some major changes in v2.0.0 and I do not plan on patching v1.6 and will end support after v2.0.0 is in production release.

p1r473 commented 1 year ago

Okay so I just tested 2.0.0 beta and a few things I noticed: boot # 1- it failed over as soon as the router booted. I figured this was a fluke, and restarted to try again. boot # 2 - failover happened, but then failback didnt happen when I plugged the cord back in. at the end of test, it settled with my primary wan as a hot standby and didnt fail back again. boot # 3- failed over, then failback happened, but then it failed over again when I never pulled the plug again. at the end of test, it settled with my primary wan as a hot standby and didnt fail back again.

I monitored both the logs in the capture window, a second SSH terminal where I was watching status, and then also watching my Asus router dashboard

Testing methodology

  1. reboot
  2. run capture command as soon as SSH reconnected
  3. wait some time and confirm Im on the proper primary WAN
  4. pull primary WAN ethernet plug
  5. wait some time, then plug it in
  6. wait some time, and then kill the logs

boot # 1

boot # 1 after config and install -it failed over as soon as the router booted. I figured this was a fluke, and restarted to try again

boot # 2-

failover happened, but then failback didnt happen when I plugged the cord back in. primary stayed as hot standby even after 5 min wan-failover-2018-05-05-01%3A05%3A35-EDT.log Other things I noticed: major activity during the first 30 seconds, including some potential problems failed to set WAN1IFNAME printed a lot WAN Failover Disabled - WAN Failover is currently disabled. ***Review Logs*** displayed also a lot Near the end of the capture the log activitity really died down

not sure if some of the emails I received were from 1.6.0 queue so I will skip showing the emails from this attempt

Boot # 3

Pulled plug around 22:54:20 Plugged back in 22:55:30 failed over, then failback happened, but then it failed over again when I never pulled the plug again. at the end of test, it settled with my primary wan as a hot standby and didnt fail back again. so either possibly a script issue, or possibly a router issue? wan-failover-2018-05-05-01%3A05%3A36-EDT.log

Email 1:

***WAN Failover Notification***
----------------------------------------------------------------------------------------
Hostname: PortRoyal
Event Time: Mar 5 22:53:48
WAN0 Status: CONNECTED
WAN1 Status: DISCONNECTED
Active ISP: CIK Telecom INC
Primary WAN: wan0
WAN IPv4 Address: 10.1.70.24
WAN Gateway IP Address: 10.1.71.254
WAN Interface: eth5
WAN IPv6 Address:
DNS Server 1: 192.168.1.186
DNS Server 2: 192.168.1.168
QoS Status: Enabled
QoS Mode: Automatic Settings
----------------------------------------------------------------------------------------

Email 2:

***WAN Failover Notification***
----------------------------------------------------------------------------------------
Hostname: PortRoyal
Event Time: Mar 5 22:55:25
WAN0 Status: UNPLUGGED
WAN1 Status: CONNECTED
Active ISP: Rogers Communications Canada Inc.
Primary WAN: wan1
WAN IPv4 Address: 192.168.0.2
WAN Gateway IP Address: 192.168.0.1
WAN Interface: eth0
WAN IPv6 Address:
DNS Server 1: 192.168.1.186
DNS Server 2: 192.168.1.168
QoS Status: Enabled
QoS Mode: Automatic Settings
----------------------------------------------------------------------------------------

Email 3:

***WAN Failover Notification***
----------------------------------------------------------------------------------------
Hostname: PortRoyal
Event Time: Mar 5 22:56:37
WAN0 Status: CONNECTED
WAN1 Status: CONNECTED
Active ISP: CIK Telecom INC
Primary WAN: wan0
WAN IPv4 Address: 10.1.70.24
WAN Gateway IP Address: 10.1.71.254
WAN Interface: eth5
WAN IPv6 Address:
DNS Server 1: 192.168.1.186
DNS Server 2: 192.168.1.168
QoS Status: Enabled
QoS Mode: Automatic Settings
----------------------------------------------------------------------------------------

Email 4:

***WAN Failover Notification***
----------------------------------------------------------------------------------------
Hostname: PortRoyal
Event Time: Mar 5 22:58:05
WAN0 Status: DISCONNECTED
WAN1 Status: CONNECTED
Active ISP: Rogers Communications Canada Inc.
Primary WAN: wan1
WAN IPv4 Address: 192.168.0.2
WAN Gateway IP Address: 192.168.0.1
WAN Interface: eth0
WAN IPv6 Address:
DNS Server 1: 192.168.1.186
DNS Server 2: 192.168.1.168
QoS Status: Enabled
QoS Mode: Automatic Settings
----------------------------------------------------------------------------------------
Ranger802004 commented 1 year ago

On #2 I know why this is happening and is a current issue I'm working through in the bugs of this beta because of major code changes. I just reviewed the code and the logic for accepting that value to be null is backwards. I assume you are using a USB Device for WAN1? As far as #3 I need debug logs for this issue to diagnose further. I'm going to correct #2 and commit as a minor fix.

p1r473 commented 1 year ago

On #2 I know why this is happening and is a current issue I'm working through in the bugs of this beta because of major code changes. I just reviewed the code and the logic for accepting that value to be null is backwards. I assume you are using a USB Device for WAN1? As far as # 3 I need debug logs for this issue to diagnose further. I'm going to correct #2 and commit as a minor fix.

oh did you mean router logs or capture logs? I provided the capture logs but please let me know if I provided the incorrect logs. This was a few reboots again so Id had to provde my entire syslog now if you need some router debug logs. Please let me know where to find them https://github.com/Ranger802004/asusmerlin/files/10894255/wan-failover-2018-05-05-01.3A05.3A36-EDT.log

WAN0 is an Asus AXE11000

WAN1is a Wiflyer WG3526 4G LTE WiFi Router, with a 4G LTE simcard in it. But I ordered a ASUS 4G-AX56 to upgrade it with in a few days.

p1r473 commented 1 year ago

Found the logs in /opt/var/log. Gathering you the relevant logs now.

Ranger802004 commented 1 year ago

Did you restart WAN0 interface? That will cause this to happen.

Mar 5 22:33:01 PortRoyal wan-failover: Debug - WAN0 Target IP Rule Missing or Default Route for 100 is invalid

Also, I see where WAN0 is going back and forth being able to ping it's target IP Address. Try increasing Recursive Ping Checks to 2 or 3.

p1r473 commented 1 year ago

Found the logs in /opt/var/log. Attempt # 3: https://pastebin.com/raw/JwnfiaYu Pulled plug around 22:54:20 Plugged back in 22:55:30

p1r473 commented 1 year ago

No sir, testing methodology was strict. I didnt run any commands to restart WAN0. I have been getting a few WIFI drops every day so wouldnt be surprised if the router is starting to die. Will be replacing this AXE11000 with a new GT-BE98 as soon as Asus releases it and Merlin supports it

Testing methodology

reboot run capture command as soon as SSH reconnected wait some time and confirm Im on the proper primary WAN pull primary WAN ethernet plug wait some time, then plug it in wait some time, and then kill the logs

p1r473 commented 1 year ago

Sure, let me know when youre ready and I can do another 3 tests with Recursive Ping Checks to 2 - 3

p1r473 commented 1 year ago

Also, I see where WAN0 is going back and forth being able to ping it's target IP Address. Try increasing Recursive Ping Checks to 2 or 3.

Shouldnt ping failures be pretty rare?

Ranger802004 commented 1 year ago

Also, I see where WAN0 is going back and forth being able to ping it's target IP Address. Try increasing Recursive Ping Checks to 2 or 3.

Shouldnt ping failures be pretty rare?

beta6 is published, go ahead and try and updated within the script. Ping failures are rare for the target IP to be down but I have seen issues with the router's not being able to ping for various reasons and sometimes the Recursive Ping Check is there to correct these problems.

p1r473 commented 1 year ago

On new version, Set recursive ping to 2 Noticed a lot of WAN1 when WAN1 should never have been down (its my failover) and even before I pulled the plug

Pulled plug at 00:10:55 Plugged back in 00:12:30 Failed over, failed back, then failed over again where it seems to have stayed messages.log wan-failover-2018-05-05-01%3A05%3A48-EDT.log

Any way to get the status to auto refresh? Would be a really nice feature. Id leave it open like a dashboard. I tried to use watch /jffs/scripts/wan-failover.sh but I think the user input prompt kills the watch command?

Will try another attempt now with recursive ping to 3.

Ranger802004 commented 1 year ago

On new version, Set recursive ping to 2

Noticed a lot of WAN1 when WAN1 should never have been down (its my failover) and even before I pulled the plug

Pulled plug at 00:10:55

Plugged back in 00:12:30

Failed over, failed back, then failed over again where it seems to have stayed

messages.log

wan-failover-2018-05-05-01%3A05%3A48-EDT.log

Any way to get the status to auto refresh? Would be a really nice feature. Id leave it open like a dashboard. I tried to use watch /jffs/scripts/wan-failover.sh but I think the user input prompt kills the watch command?

Will try another attempt now with recursive ping to 3.

Test with different ping targets and the status console does auto refresh based on Status Check setting in configuration settings.

p1r473 commented 1 year ago

Set recursive ping to 3 Pulled plug at 00:21:00 Plugged back in ~ 00:23:00 Failed over, failed back, failed over again

messages.log wan-failover-2023-03-06-00%3A19%3A25-EST.log

p1r473 commented 1 year ago

Im also noticing my amtm emails failing even though amtm email tests in amtm itself work. Any idea? I did receive 2 no body no subject emails, its as if amtm is sending the emails but the email is malformed. And then a bunch of proper notifications which I imagine were the AIProtection emails

p1r473 commented 1 year ago

Test with different ping targets and the status console does auto refresh based on Status Check setting in configuration settings.

Do you mean wan0 and wan1 different targets? Or different target in each attempt? Or do you mean just stop using 1.1.1.1?

Ranger802004 commented 1 year ago

Im also noticing my amtm emails failing even though amtm email tests in amtm itself work. Any idea? I did receive 2 no body no subject emails, its as if amtm is sending the emails but the email is malformed. And then a bunch of proper notifications which I imagine were the AIProtection emails

Sounds like your router is having trouble routing traffic after a failover, what model is it?

Ranger802004 commented 1 year ago

Test with different ping targets and the status console does auto refresh based on Status Check setting in configuration settings.

Do you mean wan0 and wan1 different targets? Or different target in each attempt? Or do you mean just stop using 1.1.1.1?

Make sure they are different targets and yes test different IP addresses.

p1r473 commented 1 year ago

Im also noticing my amtm emails failing even though amtm email tests in amtm itself work. Any idea? I did receive 2 no body no subject emails, its as if amtm is sending the emails but the email is malformed. And then a bunch of proper notifications which I imagine were the AIProtection emails

Sounds like your router is having trouble routing traffic after a failover, what model is it?

AXE11000 with latest Merlin

Ranger802004 commented 1 year ago

Im also noticing my amtm emails failing even though amtm email tests in amtm itself work. Any idea? I did receive 2 no body no subject emails, its as if amtm is sending the emails but the email is malformed. And then a bunch of proper notifications which I imagine were the AIProtection emails

Sounds like your router is having trouble routing traffic after a failover, what model is it?

AXE11000 with latest Merlin

You change the ping targets to different ones for each interface? Also what type of WAN connections do you have?

p1r473 commented 1 year ago

Make sure they are different targets and yes test different IP addresses.

Changed them both from 1.1.1.1 to

WAN0TARGET=8.8.8.8
WAN1TARGET=9.9.9.9

Ping recursive still 3 Failed over before I pulled the plug !!! messages.log wan-failover-2018-05-05-01%3A05%3A35-EDT.log

Will do another attempt now.

p1r473 commented 1 year ago

Im also noticing my amtm emails failing even though amtm email tests in amtm itself work. Any idea? I did receive 2 no body no subject emails, its as if amtm is sending the emails but the email is malformed. And then a bunch of proper notifications which I imagine were the AIProtection emails

Sounds like your router is having trouble routing traffic after a failover, what model is it?

AXE11000 with latest Merlin

You change the ping targets to different ones for each interface? Also what type of WAN connections do you have?

wan0 is fiber to the home (Optical box with ethernet cord right into the Asus router 2.5gb port wan1 is a router with 4g LTE SIM Card with ethernet cord right to the asus router's 1gb port Both are gigabit ports

p1r473 commented 1 year ago
WAN0TARGET=8.8.8.8
WAN1TARGET=9.9.9.9
RECURSIVEPINGCHECK=3

Pulled plug at ~01:02:15 Plugged back in ~ 01:04:00 Failed over, and didnt fail back messages.log wan-failover-2023-03-06-00%3A58%3A40-EST.log

Ranger802004 commented 1 year ago

Make sure they are different targets and yes test different IP addresses.

Changed them both from 1.1.1.1 to


WAN0TARGET=8.8.8.8

WAN1TARGET=9.9.9.9

Ping recursive still 3

Failed over before I pulled the plug !!!

messages.log

wan-failover-2018-05-05-01%3A05%3A35-EDT.log

Will do another attempt now.

By any chance are you using any of these as DNS Servers?

p1r473 commented 1 year ago

By any chance are you using any of these as DNS Servers?

My DNS servers are 192.168.1.186 and 192.168.1.168, both PiHoles, with upstream to 1.1.1.1 [1.0.0.1 backup] (cloudflared) with DOH or DOT (cant remember which right now)

Ranger802004 commented 1 year ago

By any chance are you using any of these as DNS Servers?

My DNS servers are 192.168.1.186 and 192.168.1.168, both PiHoles, with upstream to 1.1.1.1 [1.0.0.1 backup] (cloudflared) with DOH or DOT (cant remember which right now)

Is your WAN a double NAT or behind a firewall?

p1r473 commented 1 year ago

Is your WAN a double NAT or behind a firewall?

WAN0 is double NAT'd by my ISP and they give me a 10.x.x.x address, unfortunately cant do anything about it. I pay for a static public IP address with them. WAN1 with the 4G LTE simcard is not double NATd and I get a public IP.

For Firewall, I have the Asus fw enabled. I also use a Firewalla Gold Plus in DHCP server mode (192.168.1.2) and I also have a Fingbox (192.168.1.3) which scans the network and also some other features which operate via ARP spoofing. (I thought I saw some deauths in the logs and wonder if thats the Fingbox) I can turn the Firewalla and Fingbox off for testing if that would help.

Ranger802004 commented 1 year ago

Is your WAN a double NAT or behind a firewall?

WAN0 is double NAT'd by my ISP and they give me a 10.x.x.x address, unfortunately cant do anything about it. I pay for a static public IP address with them. WAN1 with the 4G LTE simcard is not double NATd and I get a public IP.

For Firewall, I have the Asus fw enabled. I also use a Firewalla Gold Plus in DHCP server mode (192.168.1.2) and I also have a Fingbox (192.168.1.3) which scans the network and also some other features which operate via ARP spoofing. (I thought I saw some deauths in the logs and wonder if thats the Fingbox) I can turn the Firewalla and Fingbox off for testing if that would help.

Yea try that and see if it's interfering with the ICMP Traffic, also when you start the script does it get to the point it's monitoring both WANs for failure and then you test?

Ranger802004 commented 1 year ago

Looking at those last logs you sent me, it doesn't look like it actually failed over, what looks like happened was WAN1 was either unplugged or restarted or etc and the script sees the IP Rules disappears and forces the failover function to run just to make sure the WAN settings didn't change (This is due to the way the firmware handles these events). It did that and went to WAN Status checks where it sees it can't ping WAN1, sends an email that WAN1 was Disconnected and then shortly later it's able to ping again and then sends an email again that it is connected.

p1r473 commented 1 year ago

Okay I will unplug both my firewalla and my fingbox and I'll wait longer before I start unplugging the cable so that the script gets to the monitoring part before I unplug it

Will do some more testing today

p1r473 commented 1 year ago

Changes: Fingbox unplugged Firewalla unplugged Will now wait 2 minutes after boot before unplug to allow script to kick in WAN0TARGET=8.8.8.8 WAN1TARGET=9.9.9.9 RECURSIVEPINGCHECK=3

Testing Methodology

  1. Clear logs
  2. reboot
  3. run capture command as soon as SSH reconnected
  4. wait some time and confirm Im on the proper primary WAN
  5. wait until capture say "monitoring for fail"
  6. pull primary WAN ethernet plug
  7. wait until capture says "wan-failover disabled", then plug it back in
  8. wait some time, and then kill the logs

Test 1

Problems noticed: Failed over, but didnt fail back Did receive emails about WAN1 being disconnected when it should have been up Wifi cut out after it stabilized on WAN1

wan-failover-2023-03-06-15%3A35%3A21-EST.log messages.log

Pulled plug at 15:37:45 Plugged back in 15:39:42

Test 2

Problems noticed: Failed over, but didnt fail back Wifi cut out twice before I even unplugged anything WAN1 showed as disconnected when it wasnt Saw some WAN1 100% packet loss errors. Emails sent before I unplugged anything

messages.log wan-failover-2023-03-06-15%3A46%3A46-EST.log

Pulled plug at 15:49:37 Plugged back in 15:51:07

Ranger802004 commented 1 year ago

Instead of testing by unplugging try and simulate a real outage. It seems as if your router is having sporadic behavior when the interfaces are removed.

p1r473 commented 1 year ago

@Ranger802004 okay, I will try to unplug the power of the OPT box (optical, fiber to the home) upstream of the router.

p1r473 commented 1 year ago

Test 1

Problems noticed: Wifi went out 17:38:13 before I did anything Received an email that WAN1 was disconnected before I did anything Received an email that WAN1 was (re)connected before I did anything Detected WAN1 packet loss 100% when I never touched it Failed over, and never failed back. Finished test with primary wan0 at COLD standby messages.log wan-failover-2023-03-06-17%3A36%3A30-EST.log Optical box power unplugged at ~ 17:39:45 Optical box power plugged back in at ~ 17:41:00 Wifi went out at ~17:44:44 after it stabilized on WAN1

Test 2

Problems noticed: 17:47:07, 17:47:46 wifi went out before I did anything 17:53:59 wifi went out after it stabilized on wan1 Received an email that WAN1 was disconnected before I did anything Received an email that WAN1 was (re)connected before I did anything Failed over, and never failed back. Finished test with primary wan0 at HOT standby messages.log wan-failover-2018-05-05-01%3A05%3A39-EDT.log Optical box power unplugged at 17:48:18 Optical box power plugged back in at 17:50:00

p1r473 commented 1 year ago

One thing to note is that the Asus built in failover works perfectly for me.

Ranger802004 commented 1 year ago

Is your ONT directly connected to your route? If so powering it off is the same as unplugging. Pull the fiber from it instead.

p1r473 commented 1 year ago

If the Asus built in dual WAN could handle this, the script should be able to handle it too no? I feel like we would be missing an edge case if we can't get this to work