ericpaulbishop / gargoyle

Gargoyle Router Management Utility
http://www.gargoyle-router.com
469 stars 221 forks source link

WD MyNet 750 WiFi fails between 10 and 20 days of uptime. #932

Open apassiou opened 3 years ago

apassiou commented 3 years ago

Issue goes back as far as Gargoyle version 1.10 (and possibly earlier), verified on 3 different units. Strangely issue is not present in the My Net 600 version of the routers.

Behavior is as follows. Router is up and functioning. After about a 10 days (usually closer to 17 days) devices connected to wifi of the router will not be able to route to internet. Wired devices still route to the internet no issues, so this is specific to wifi.

Things I have tried:

So my current workaround is to crontab an nightly script to bring down wifi, wait 10 seconds and then bring it back up. Its working. But ideally this issue is resolved. Its been around for at least 4 years. Since its not happening on N600 routers Im guessing its something tied to the chipset used in N750. Something locks up (buffer issue?) and routing breaks.

Im not sure what other useful info I can provide as logs dont seem to mention anything about it. Perhaps there are externals related to wifi that can be updated, then we cross our fingers and hope bug is fixed?

For those that need workaround:

#!/bin/sh
wifi down
sleep 10
wifi up

I made my script in /tmp/wifi_workaround.sh and added it in crontab to run every night when everyone is sleeping.

ronniehd commented 3 years ago

Hi, do you have guest network enabled? I used to have the same issue but only with 2.4GHz network since Gargoyle version 1.9.x i think. Try disabling it and test. Is what worked for me. I have a TP-Link archer c7 v2 that i recently upgraded to 1.13.0 with guest network enabled, 9 days 13 hours uptime so I can't tell if the same bug persist but I'll report back after 20+ days uptime.

apassiou commented 3 years ago

No I dont have guest network enabled.

I loaded OpenWRT latest build and have had it up:

OpenWrt 21.02.0-rc3, r16172-2aba3e9784

root@OpenWrt:~# uptime 12:57:29 up 47 days, 23:59, load average: 0.00, 0.00, 0.00

No issue. So I think its specific to Gargoyle.

apassiou commented 3 years ago

I think I may have found the issue. Will need a lot of testing (since it can take a few weeks for a failure to happen).

I believe the issue may be related to TX power, by default if Region is set to 00-World max TX power is 20dbm. If Region is set to US then max TX is 22dbm.

I noticed that on my Mynet750 set to 00-World which by default maxes out at 20dBm, and with only 2.4Ghz powered on. As you can see in my post above 47 days+ and no issue.

On another mynet 750 I changed the value to US as region. Which defaults to 22dBm. WiFi failed withing 24 hours. I also have 5Ghz on on this one. So something is going on with power draw, this particular router (and perhaps WiFi chip) bugs out when it needs to draw too much power.

I am guessing that 2.4Ghz by itself (with 5.0Ghz off) should do fine with 20dBm. But I am afraid that with both 2.4Ghz and 5.0Ghz on, 20dBm will still be too much. Thats the theory.

Right now I set both 2.4Ghz and 5Ghz to 19dBm (1 notch down from max on 00-World setting). Will report back results (probably at least a month from now, unless it fails before then).

PS - I should clarify, not a Gargoyle only issue. Issue is with OpenWRT as well.

if this indeed is TX issue perhaps Gargoyle (and OpenWRT) should default to 18dBm or something like that. If this is a chip issue it will be widespread. And most wont be technical enough to debug it. Looking at connected devices (if GUI is to be trusted) the Signal/Noise to devices is identical on 19dBm as it is on 22dBm. Which kind of makes sense since most devices run at sub 15dBm.

lantis1008 commented 3 years ago

I'm interested to hear your findings and whether this is applicable to other devices. I don't think we would ever impose a maximum db limit (that's up to the user), but it would certainly be an interesting piece of advice that can be included in the knowledge bank.

apassiou commented 3 years ago

I'm interested to hear your findings and whether this is applicable to other devices. I don't think we would ever impose a maximum db limit (that's up to the user), but it would certainly be an interesting piece of advice that can be included in the knowledge bank.

Yeah Ill report back. I wasnt saying to impose upper limit, I was saying to default to something like 18, with users able to go higher if they wish.

apassiou commented 3 years ago

Just a small update. 7 days and so far so good. Previously it would take up to 20 days to fail so Im not celebrating yet. But ill update again in a week or two.

apassiou commented 3 years ago

Well, its been 30 days, still running well. So I think this indeed is the issue.

In summary, with Wifi TX power of 20dbm and higher the hardware in mynet750 router becomes unstable and eventually stops transmitting data. Setting TX power to 19dbm or lower stabilizes the hardware and issue is no longer observed.

I implore developers to set default config for this router to 19dbm TX power. If a user desires to do so they can still change it to higher values, but for most cases out there 19 is more than enough due to the fact that most WiFi devices do not transmit at higher than 15dbm or so. Therefore levels between 16 and 22 (current max) should make zero difference in wifi speed/range.

I am sure many users of these routers simply threw them away, dismissing this as failed hardware, while in reality these routers can work many more years if configured properly. Most users do not know what dbm is and seeing how long it took me to diagnose this problem they will never attempt to change it. So setting it to 19 by default is the best course of action in my opinion.

I will continue to provide updates as time goes on, but I believe the issue will not happen with TX power set to 19dbm or lower.

ronniehd commented 3 years ago

Hi, i think this might be the same issue im now experiencing in my archer c7 v2. I did change default 00-world to US. After 7+ days it gets unresponsive, I can't login nor ssh into it, a reboot fixes it until the next 7+ days. I was thinking in downgrading to v1.12 but ill give it one more chance by updating 1.13 firmware and changing back to 00-world. Thanks!

ronniehd commented 3 years ago

Hi, i think this might be the same issue im now experiencing in my archer c7 v2. I did change default 00-world to US. After 7+ days it gets unresponsive, I can't login nor ssh into it, a reboot fixes it until the next 7+ days. I was thinking in downgrading to v1.12 but ill give it one more chance by updating 1.13 firmware and changing back to 00-world. Thanks!

apassiou commented 3 years ago

44 days later and it happened again. Definitely a much longer time before occurrences. So I am knocking down transmission power down to 18... will report back.

lantis1008 commented 3 years ago

1 month ok?

apassiou commented 3 years ago

Yes ok so far at 18.

lantis1008 commented 1 year ago

It may be worth trying the latest test images (1.15). Otherwise please close this issue, thanks for your investigation