gnuton / asuswrt-merlin.ng

Extends the support of Merlin firmware to more ASUS routers
Other
1.42k stars 82 forks source link

RT-AX95Q / Zenwifi AX XT8 - Router speed drops off or router cannot be connected #333

Open Kellen6 opened 1 year ago

Kellen6 commented 1 year ago

Router Model Affected RT-AX95Q / Zenwifi AX XT8

Firmware Version Affected 388.1_0-gnuton0_beta2

Is this bug present in upstream Merlin releases too? This issue started with the ASUS firmware upgrades sometime in October/November 2022. I switched to Merlin in January seeking a resolution, but the issue continues and has become more consistent.

Describe the bug Case 1: Internet speed coming from the router will suddenly and significantly throttle down, remaining extremely slow and requiring a reboot to stabilize. Case 2: The router Wi-Fi will show as available, but devices are unable to connect to the router. A reboot is required (and sometimes multiple reboots) before the router reconnects to the internet and becomes truly available for wifi connection.

In both cases, the internet connection from the ISP appears to be uninterrupted, both according to feedback from the ISP but also because a reboot solves the issue.

To Reproduce It's not clear how to reproduce it, as the this occurs at different times during the day. Case 2 will sometimes occur when rebooting the router even if the router was working previously. If there is a router log or some other analysis that would be helpful to share, please let me know!

Expected behavior Uninterrupted & consistent bandwidth quality.

Thanks in advance for your help!

lhayati commented 1 year ago

Unsure if this is the same issue as you mention, we just got gigabit fibre installed and on the latest stable 388.1_0 I could only get 300mbps over LAN no matter what settings I changed or amount of resets. Reverting both my router and nodes to stable 386.07_2 immediately fixed this issue and I get 1gbps no problem.

Kellen6 commented 1 year ago

@lhayati the experience on my side is much more limited than that, with the throttling basically rendering the connection unusable.

In case it helps, I've attached a system log with two of the events: -At 07:55, the router not allowing devices to connect and requiring a restart -At 14:24, where the speed throttled down to the point that internet is super slow / not responsive and required restarting to fix

syslog.txt

rlfrank165 commented 1 year ago

I'm replying to this among many of these reports of strange and inexplicable behavior of 388 based releases, with a crazy idea that I hope doesn't get me kicked off this list:

I have a 3 node XT8 AI mesh (ethernet backhaul) exhibiting all sorts of inexplicable behaviors as I've switched between various versions of both the ASUS 388, and even going back to 386 releases and the various Gnuton beta and stable releases, both with and without first resetting nodes to factory state, all without much improvement.

The other night I decided to do something semi-radical: do a factory reset and load the Stable Gnuton 388 release and NOT re-load my configuration (CNF) file but literally re-configure the router from the UI (i.e., reset all wifi network, WAN, DDNS, port forwarding rules, etc. and dozens of other parameters all from the UI) which was not simple as I have made many local configuration changes. I should mention that I have not totally finished resetting my router because I literally have about 75 manually assigned DHCP manually assigned "static" entries as I find this helps makes my home network, which has 75+ IoT smarthome devices (heavily TP-Link KASA devices but also cameras and media devices, more manageable), so these are now simply randomly DHCP assigned. (Starting to worry about reaching the 128 node limit for pre-assigned DHCP addresses :-) Fixing that will take several more hours but I don't want to do that until I know that everything remains stable (who knows, maybe having that many DHCP-assigned static entries is part of the problem, although that seems very unlikely)

But given that caveat, everything has been very stable now for almost 24 hours. Is there any chance, for those who know the innards of the ASUS routers, that corruption can get into the ASUS cnf files after re-loading and then re-saving them many times, that could have any explanations of why so many people are experiencing all of this un-reliability with the 388 releases?

As I said, I'm not yet convinced that this explained my issues, but there is some indications it may have.

Randy Frank

On Fri, Jan 27, 2023 at 7:36 AM Kellen6 @.***> wrote:

@lhayati https://github.com/lhayati the experience on my side is much more limited than that, with the throttling basically rendering the connection unusable.

In case it helps, I've attached a system log with two of the events: -At 07:55, the router not allowing devices to connect and requiring a restart -At 14:24, where the speed throttled down to the point that internet is super slow / not responsive and required restarting to fix

syslog.txt https://github.com/gnuton/asuswrt-merlin.ng/files/10518828/syslog.txt

— Reply to this email directly, view it on GitHub https://github.com/gnuton/asuswrt-merlin.ng/issues/333#issuecomment-1406445685, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2TBKSWZDUMOA6UBAMKMUI3WUO6LPANCNFSM6AAAAAAUHL6LXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

LarryRosen108 commented 1 year ago

Randy - I look forward to stability updates. I have a 2 node XT8 system and rolled back to last gnuton 386 firmware. Have never taken the full reset step but will if you stay stable for a bit. Worse thing that could happen is having to roll back again...

On Sat, Jan 28, 2023, 3:32 PM Randy Frank @.***> wrote:

I'm replying to this among many of these reports of strange and inexplicable behavior of 388 based releases, with a crazy idea that I hope doesn't get me kicked off this list:

I have a 3 node XT8 AI mesh (ethernet backhaul) exhibiting all sorts of inexplicable behaviors as I've switched between various versions of both the ASUS 388, and even going back to 386 releases and the various Gnuton beta and stable releases, both with and without first resetting nodes to factory state, all without much improvement.

The other night I decided to do something semi-radical: do a factory reset and load the Stable Gnuton 388 release and NOT re-load my configuration (CNF) file but literally re-configure the router from the UI (i.e., reset all wifi network, WAN, DDNS, port forwarding rules, etc. and dozens of other parameters all from the UI) which was not simple as I have made many local configuration changes. I should mention that I have not totally finished resetting my router because I literally have about 75 manually assigned DHCP manually assigned "static" entries as I find this helps makes my home network, which has 75+ IoT smarthome devices (heavily TP-Link KASA devices but also cameras and media devices, more manageable), so these are now simply randomly DHCP assigned. (Starting to worry about reaching the 128 node limit for pre-assigned DHCP addresses :-) Fixing that will take several more hours but I don't want to do that until I know that everything remains stable (who knows, maybe having that many DHCP-assigned static entries is part of the problem, although that seems very unlikely)

But given that caveat, everything has been very stable now for almost 24 hours. Is there any chance, for those who know the innards of the ASUS routers, that corruption can get into the ASUS cnf files after re-loading and then re-saving them many times, that could have any explanations of why so many people are experiencing all of this un-reliability with the 388 releases?

As I said, I'm not yet convinced that this explained my issues, but there is some indications it may have.

Randy Frank

On Fri, Jan 27, 2023 at 7:36 AM Kellen6 @.***> wrote:

@lhayati https://github.com/lhayati the experience on my side is much more limited than that, with the throttling basically rendering the connection unusable.

In case it helps, I've attached a system log with two of the events: -At 07:55, the router not allowing devices to connect and requiring a restart -At 14:24, where the speed throttled down to the point that internet is super slow / not responsive and required restarting to fix

syslog.txt https://github.com/gnuton/asuswrt-merlin.ng/files/10518828/syslog.txt

— Reply to this email directly, view it on GitHub < https://github.com/gnuton/asuswrt-merlin.ng/issues/333#issuecomment-1406445685 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/A2TBKSWZDUMOA6UBAMKMUI3WUO6LPANCNFSM6AAAAAAUHL6LXQ

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/gnuton/asuswrt-merlin.ng/issues/333#issuecomment-1407480290, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2XR7N4IZDKYADT7XC335K3WUV64RANCNFSM6AAAAAAUHL6LXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rlfrank165 commented 1 year ago

One other question for people more expert than me:

Does doing a factory reset (either thru the UI or via the WPS hardware button reset) also reset the jffs, or do you need to also do the jffs reset (for some strange reason only available in the Merlin versions I just noticed) to fully clear out the jffs and reset it to factory state? Reason I'm asking is that evidently there are some configuration related settings kept in jffs as I understand, and want to make sure one is really getting to true virgin factory state.

Reason I'm asking is almost too complicated to explain but there are some very, very subtle things I've noticed that can't be explained in terms of anything other than possible state stored in the jffs. If you must know (and this one sounds crazy) in the network map there is one particular static IP assigned device (a managed ethernet switch) that sometimes appears in the list and sometimes doesn't for almost no explicable reason. I just noticed that most of the times it doesn't appear, but I did something (not quite sure what) so that now it does appear. So I'm really trying to understand what you have to do to make sure you are REALLY getting a ASUS box to REALLY get back to true factory original state, and I'm wondering exactly what it is that causes this to happen. Part of me is convinced that simply setting factory default in the UI doesn't really get the damned box to a true factory state, and not convinced that even a WPS button hardware reset always does.

I do feel like these stability issues I and others are having are starting to drive me crazy. I was seriously looking at a router from TP-Link but as good as the hardware is (it's their high end router with multiple 10 gpbs interfaces) and a pretty good UI, there really isn't a comparable router on the market that has the degree of expert level control and flexibility that ASUS has (not to mention lifetime free TrendMicro high-end router protection) which actually is one thing that really does give ASUS router a significant total cost of lifetime ownership advantage over almost everyone else. So I really want to get back the super-stable ASUS routers I have grown to love but seem to have disappeared at the moment.

Randy

On Sat, Jan 28, 2023 at 3:31 PM Randall Frank @.***> wrote:

I'm replying to this among many of these reports of strange and inexplicable behavior of 388 based releases, with a crazy idea that I hope doesn't get me kicked off this list:

I have a 3 node XT8 AI mesh (ethernet backhaul) exhibiting all sorts of inexplicable behaviors as I've switched between various versions of both the ASUS 388, and even going back to 386 releases and the various Gnuton beta and stable releases, both with and without first resetting nodes to factory state, all without much improvement.

The other night I decided to do something semi-radical: do a factory reset and load the Stable Gnuton 388 release and NOT re-load my configuration (CNF) file but literally re-configure the router from the UI (i.e., reset all wifi network, WAN, DDNS, port forwarding rules, etc. and dozens of other parameters all from the UI) which was not simple as I have made many local configuration changes. I should mention that I have not totally finished resetting my router because I literally have about 75 manually assigned DHCP manually assigned "static" entries as I find this helps makes my home network, which has 75+ IoT smarthome devices (heavily TP-Link KASA devices but also cameras and media devices, more manageable), so these are now simply randomly DHCP assigned. (Starting to worry about reaching the 128 node limit for pre-assigned DHCP addresses :-) Fixing that will take several more hours but I don't want to do that until I know that everything remains stable (who knows, maybe having that many DHCP-assigned static entries is part of the problem, although that seems very unlikely)

But given that caveat, everything has been very stable now for almost 24 hours. Is there any chance, for those who know the innards of the ASUS routers, that corruption can get into the ASUS cnf files after re-loading and then re-saving them many times, that could have any explanations of why so many people are experiencing all of this un-reliability with the 388 releases?

As I said, I'm not yet convinced that this explained my issues, but there is some indications it may have.

Randy Frank

On Fri, Jan 27, 2023 at 7:36 AM Kellen6 @.***> wrote:

@lhayati https://github.com/lhayati the experience on my side is much more limited than that, with the throttling basically rendering the connection unusable.

In case it helps, I've attached a system log with two of the events: -At 07:55, the router not allowing devices to connect and requiring a restart -At 14:24, where the speed throttled down to the point that internet is super slow / not responsive and required restarting to fix

syslog.txt https://github.com/gnuton/asuswrt-merlin.ng/files/10518828/syslog.txt

— Reply to this email directly, view it on GitHub https://github.com/gnuton/asuswrt-merlin.ng/issues/333#issuecomment-1406445685, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2TBKSWZDUMOA6UBAMKMUI3WUO6LPANCNFSM6AAAAAAUHL6LXQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gnuton commented 1 year ago

@rlfrank165 Loading back backups after factory reset is SUPER BAD since the router will get a mix of new and old configurations (variables) which only god konws how it will work.. and of course you noticed that. The configuration is in the NVRAM and /jffs is a partition. You can keep /jffs untouched during firmware changes. IIRC the nvam variables which get touched by the factory resets are the one that have default value in the firmware.

rlfrank165 commented 1 year ago

Of course I understand that but given it takes me 4 to 6 hours to manually re-establish all of the settings I have in my router including over 75 pre-assigned IP addresses this makes cold flashing my router to switch versions not something I can do very easily. At this point it is so clear to me that the problems I and lots of other people are experiencing all come from the Asus 388 versions of the firmware I wonder if there is really anything useful we can do in diagnosing these underlying problems until Asus gets its act together.

Randy

On Mon, Jan 30, 2023, 3:52 AM Antonio Aloisio @.***> wrote:

@rlfrank165 https://github.com/rlfrank165 Loading back backups after factory reset is SUPER BAD since the router will get a mix of new and old configurations which only god konws how it will work.. and of course you noticed that.

— Reply to this email directly, view it on GitHub https://github.com/gnuton/asuswrt-merlin.ng/issues/333#issuecomment-1408205736, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2TBKSW4HSBBKB6K2R6MWXLWU56KXANCNFSM6AAAAAAUHL6LXQ . You are receiving this because you were mentioned.Message ID: @.***>

Kellen6 commented 1 year ago

An interesting observation from the syslog today after the router had booted up but was not allowing connections. During this time, it looks like there was a repeating error of "kernel: MDIO Error: MDIO got failure status on phy 30". When I manually rebooted, the syslog then showed several booting lines with dates of May 5, before what appears to be a flash of the ASUS server and a resetting of the dates to today. For anyone who knows more than me, I'd be curious to know:

syslog1Feb2023.txt