aparcar / openwrt

Staging tree of Paul Spooren
Other
8 stars 1 forks source link

FS#762 - TP-Link WR1043ND v4: Network on switch fails at random #536

Closed aparcar closed 6 years ago

aparcar commented 7 years ago

r43k3n:

Supply the following if possible:

Network stops working, both LAN and WAN. Command ifup wan doesn't help but service network restart does. It's like there is no network on switch, yet WiFi works fine. Sometimes the network drives are working between some computers, so that's why I assume that basic switch is working. DHCP etc. are not assign. Windows shows Network unidentified. Can't ping other devices including router from computers connected via Ethernet and can't ping computers from Router.

Happened also on v17.1.0. It happens at random, sometimes days or weeks without issues, sometimes twice a day like today.

aparcar commented 7 years ago

Sven:

I have the same behaviour on TL-WR841N v9. The switch stops working and LAN/WAN doesn't transmit any packets any more. WiFi still works fine.

The problem occurs at random times, sometimes it works fine for weeks, than only for a few days. I already changed the power supply, but that didn't fix the problem.

In my cause, I didn't need to install any additional packages.

aparcar commented 7 years ago

r43k3n:

I've created a forum thread for this issue where other people also reported the same problem. I would appreciate if you'd replay there too. Also please add your vote for this issue by clicking +1 above.

https://forum.lede-project.org/t/lan-stops-working-every-now-and-then/2648/47

aparcar commented 7 years ago

Dm1:

Same bug for WR1043NDv2. Switch fails every 10 minutes.

aparcar commented 7 years ago

alBendin:

Same bug for WR1043NDv4 on LEDE Reboot 17.01.2 r3435-65eec8bd5f

aparcar commented 7 years ago

lynxis:

How does the switch stopped working? Do you mean you cannot reach anymore the wr1043? Can you reach other devices throught the switch?

If you have a serial could you do please:

on the router: ping a computer on the computer: ping the router

A swconfig dev switch0 show and ifconfig once and 60 sec later so we might see any difference in statistics (the outage must hold then at least 60 sec).

aparcar commented 7 years ago

r43k3n:

How should I know? It's dead both WAN and LAn and I've seen more people complain about this issue.

By "cannot reach" I mean I can't log into the router from SSH or ping the router from using SSH while connected though Ethernet cable. I also cannot reach other devices connected though switch. They don't get IP or any other other info from router. Switching packets while in progress (like copping files over Windows network) is still working even after the failure so the basic switch capabilities are working.

sysconfig and dmesg are clean I posted them before somewhere on forum. No one (including me) find there anything and I mean anything. Not a single mention about this state.

I can't ping the router while connected using Ethernet at all.

aparcar commented 7 years ago

Dm1:

-//How does the switch stopped working? Do you mean you cannot reach anymore the wr1043? Can you reach other devices throught the switch?//

You can do nothing through the switch in this case, like if it was powered down completely and WR1043 is inaccessible through Ethernet ports at all. BUT! You can still access it through wi-fi without any problem. No internet though, because WAN is down anyway.

-//If you have a serial could you do please: Attach a dmesg?//

As I said above, we can go an easy way and get dmesg through wi-fi. I'll try that as soon as possible.

-//on the router: ping a computer on the computer: ping the router//

Timeout for both.

-//A swconfig dev switch0 show and ifconfig once and 60 sec later so we might see any difference in statistics (the outage must hold then at least 60 sec).//

I'll try this.

aparcar commented 7 years ago

lynxis:

@Dm1 it would be nice if you can do some debug things if you have the time.

swconfig dev switch0 show + ifconfig wait 60 sec. swconfig dev switch0 show + ifconfig wait 10 sec swconfig dev switch0 show + ifconfig (on router) start ping <some ip on the lan> swconfig dev switch0 show + ifconfig

ping is still running

wait 60 sec swconfig dev switch0 show

aparcar commented 7 years ago

Sven:

I use the TL-WR841N as an access point, so it neither offers DHCP nor uses iptables (I disabled both). When my Icinga informs me that the access point is down, I cannot ping or ssh it from the LAN side. But I still can connect to it from the WLAN side, setting a static IP address on my notebook.

"brctl show" looks normal, "dmesg" doesn't show any recent log entries. But any traffic that should go through the LAN ports simply does not pass. The "link" LED on the switch the TL-WR841N is connected to is still on.

Unfortunately (or fortunately) the problem did not occur for the last 5 weeks, so I cannot provide "swconfig" output as requested. But I will provide it as soon as the problem occurs again.

aparcar commented 7 years ago

lynxis:

@Sven: what is your hardware version of wr841n?

aparcar commented 7 years ago

Sven:

@Alexander Couzens: First of all, thanks for taking care of this problem. I was using OpenWrt for some years before switching to LEDE and it's great to see that problems are taken care of now, which (as it seems to me) wasn't the case in the old OpenWrt days.

My TL-WR841N is a V9.

Thanks! Sven

aparcar commented 7 years ago

MPW:

I've seen this bug on multiple TP-Link TL-WR1043ND V4.

Here's a link to a swconfig-dump: https://forum.freifunk.net/t/tp-link-tl-wr1043nd-spezialconfig/14577/25?u=mpw

It never occured on a WR841N. At least not with openwrt. Maybe this is a problem with newer kernel versions?

aparcar commented 6 years ago

kopo:

Same issue with my brand new router, its somehow completly random, but

I always get this issue while playing paladins, succesfully found a match, selecting champions, THEN right after the champion select there is a loading screen before the match starts (I think it is when connecting to the servers, each player have an indicator when ready) and THAT point is this problem is happening quite often, never experienced the issue at another situation, just that loading screen freezing dayly a few times there.

Cannot access anymore to LUCI, neither the modem gui, dont gifted me IP from LEDE DHCP. /Somes says after about 10-20mins its reboot itself/ I never waited that long, because I could reach LUCI from my phone via Wifi for a reboot, which solves the problem for that time.

ITS always a lottery to TP-LINK let me play or get insta loss for disconnect and penalted :DDD

There is always 3 wired and a lot of Wifi connected devices about 6 phones + 2~3 laptops + 1xbox + 1 TV = thats around 10~12 devices connected by wireless

also having active QoS and Adblock installed packeges.

ISP: UPC, bandwidth: 120/10 via modem/router combo called CONNECTBOX set bridge mode Model TP-Link TL-WR1043N/ND v4 Firmware Version LEDE Reboot 17.01.4 r3560-79f57e422d / LuCI lede-17.01 branch (git-17.290.79498-d3f0685) Kernel Version 4.4.92

Helfen Sie bitte! kopo

aparcar commented 6 years ago

MPW:

I guess, the real question is, weather this happens with the original firmware aswell or not.

@kopo, could you flash your device back to stock firmware and test weather those crashes happen, too?

This way we could determine weather it's a hardware problem or a driver problem in lede.

Regards, Matthias

aparcar commented 6 years ago

kopo:

I do not flashed back to stock FW yet, because somewhere I read that do not solved the problem for someone. The machine about 2 weeks old, at the installation process my first step was flash to LEDE, for SQM service(that didn't really worked so end up using QoS its ok)

Do not even know which version was stock, I will flash the latest then put to the test, trying recreation the statements for a few days,

aparcar commented 6 years ago

r43k3n:

Someone on forum said he's having this issue also on original firmware which is OpenWRT (AA I guess) based. However he was the only person to report this, there was no corroboration on this subject.

Interesting enough, when you contact TP-Link support in Poland they just replace your unit under warranty or just approve return of money. It's like they are aware of this issue, they just don't admit it. It is also happening only on V4.

aparcar commented 6 years ago

MPW:

Well, I guess, someone has to test it. Some rumors from the internet aren't helpful. Just sitting tight doesn't fix this here ;)

aparcar commented 6 years ago

TVTVTV:

Hello guys, i have just purchased one 1043ND v4 in Romania. ISP is RDS, bandwidth is 300/150 mbit. My unit is also 100% affected by this bug and i can reproduce it in a very reliable fashion. All i have to do is load my ISP's [[http://www.rcs-rds.ro/internet-digi-net/testeaza-ti-viteza|speedtest page]] and run a test. The D/L part of the test is always a pass, but whenever the U/L test comes up, the router's copper ports freeze completely (WAN, LAN). The heartbeat LED continues to "beat", i can see my Samba share no issues, wireless works (but no internet - i can access LuCi though from any device connected to the Wi-Fi network) but WAN loses connection and my desktop PC (only device connected to the router via wire) "sees" the connection as "Identifying...". The status LEDs for both WAN and LAN continue to work as intended, as if i remove the cable from either LAN 1 or WAN, the corresponding LEDs turn off. Inserting the cable again presents a green LED.

I have had this issue happen in two other contexts: downloading a large file over Torrent at ~30 MB/s and switching QoS on in LuCi while a 15 MB/s download was running.

Attempts to fix:

1) Enabling QOS and limiting the upload to 140 mbit: this works perfectly, i can no longer "crash" the router via the speedtest page. The router still crashes under HEAVY download traffic (~300 mbit, wired). Fix fail;

2) Reverting to stock firmware (via TFTP). This fixes the problem 100% - the router no longer hangs in either one of the two above scenarios at all. "Fix" works.

My conclusion is, then, that this is an issue with Lede. Please let me know what you guys need me to do (capture logs etc.) and i will gladly help.

3'rd party FWs tested: SuperWRT r5275 (https://superwrt.download/), Lede 17.01.4.

I have attached two screenshots, one of a failed test that locks up the router and a successful one using the stock FW.

Have a nice evening! :)

aparcar commented 6 years ago

Dm1:

It's not a hardware bug, because it can't be reproduced on OpenWRT, it's LEDE related for sure. TP-Link 1043ND v4 running OpenWrt Chaos Calmer 15.05.1 for about a year without any issue. If I flash LEDE there - it fails in about an hour.

aparcar commented 6 years ago

MPW:

@mihnea: Thanks for your report. That leaves hope :)

aparcar commented 6 years ago

MPW:

I can reproduce this on Gluon (German Freifunk open wifi communitie's software), which is based on openwrt but has a lot of lede patches in it.

So probably one of these patches cause this issue then, if it's really not reproducable in openwrt.

aparcar commented 6 years ago

rotanid:

Alexander asked for some debugging infos months ago - and no one cared to provide them. There's also still no one showing how to reliably reproduce the issue.

Also, Dmitry claims he has the device running for a year with OpenWrt CC - which is almost impossible, since the support for this device was only added in Februar 2017 according to the commit date.

Please keep in mind that a bug/issue tracker is no forum software. Unless you can provide useful technical information you only make this thread longer to read for everyone trying to help or searching for information.

aparcar commented 6 years ago

TVTVTV:

Hello rotanid, i can reliably reproduce the issue with both Lede and SuperWRT (Lede-based 3'rd party FW), as mentioned in my post yesterday (read above). I have also been in contact with Daniel, the creator of SuperWRT, and he can reproduce the issue as well on his 1043ND V4 test unit. Matthias has stated above that he can also reproduce the issue on his unit running a "fork" of Lede, Gluon. So it's safe to assume that the issue is easily reproducible in some environments.

Regarding logs, i am a total newbie when it comes to networking and Linux. I thus have little knowledge on what logs are needed. If someone can pass me a set of commands that i can run via SSH which will produce all the logs that the devs need, i will reinstall Lede today after work and will help. Matthias has also provided a swconfig-dump a few posts up.

Concerning OpenWRT, i haven't tested that FW myself so i cannot say if it suffers from the same issues as Lede. As stated, i am a newbie and seeing that Chaos Calmer was only available for 1043ND up to v2 i thought OpenWRT does not offer support for v4. I have just noticed that the development snapshots support v4. I will try to take the router out of production tonight after work (or ASAP), install the latest snapshot and report back on whether it suffers from the same issue as Lede or not.

Have a nice one! :)

aparcar commented 6 years ago

Dm1:

Also, Dmitry claims he has the device running for a year with OpenWrt CC - which is almost impossible, since the support for this device was only added in Februar 2017 according to the commit date.

It's my own build based on 1043 v2 with this patch: --- a/target/linux/ar71xx/image/Makefile +++ b/target/linux/ar71xx/image/Makefile @@ -2057,6 +2057,9 @@ $(eval $(call SingleProfile,TPLINK,64kraw,TLWR1043V1,tl-wr1043nd-v1,TL-WR1043ND,

$(eval $(call SingleProfile,TPLINK-LZMA,64kraw,TLWR1043V2,tl-wr1043nd-v2,TL-WR1043ND-v2,ttyS0,115200,0x10430002,1,8M)) $(eval $(call SingleProfile,TPLINK-LZMA,64kraw,TLWR1043V3,tl-wr1043nd-v3,TL-WR1043ND-v2,ttyS0,115200,0x10430003,1,8M)) + +$(eval $(call SingleProfile,TPLINK-LZMA,64kraw,TLWR1045V2,tl-wr1045nd-v2,TL-WR1043ND-v2,ttyS0,115200,0x10450002,1,8M)) + $(eval $(call SingleProfile,TPLINK-LZMA,64kraw,TLWR2543,tl-wr2543-v1,TL-WR2543N,ttyS0,115200,0x25430001,1,8Mlzma,-v 3.13.99))

$(eval $(call SingleProfile,TPLINK-SAFELOADER,64kraw,CPE510,cpe210-220-510-520,CPE510,ttyS0,115200,$$(cpe510_mtdlayout),CPE510)) @@ -2121,7 +2124,7 @@ $(eval $(call MultiProfile,TLWR743,TLWR743NV1)) $(eval $(call MultiProfile,TLWR841,TLWR841NV15 TLWR841NV3 TLWR841NV5 TLWR841NV7)) $(eval $(call MultiProfile,TLWR842,TLWR842V1)) $(eval $(call MultiProfile,TLWR941,TLWR941NV2 TLWR941NV3 TLWR941NV4)) -$(eval $(call MultiProfile,TLWR1043,TLWR1043V1 TLWR1043V2 TLWR1043V3)) +$(eval $(call MultiProfile,TLWR1043,TLWR1043V1 TLWR1043V2 TLWR1043V3 TLWR1045V2)) $(eval $(call MultiProfile,TLWDR4300,TLWDR3500V1 TLWDR3600V1 TLWDR4300V1 TLWDR4300V1IL TLWDR4310V1 MW4530RV1)) $(eval $(call MultiProfile,TUBE2H,TUBE2H8M TUBE2H16M)) $(eval $(call MultiProfile,UBNT,UBNTAIRROUTER UBNTRS UBNTRSPRO UBNTLSSR71 UBNTBULLETM UBNTROCKETM UBNTROCKETMXW UBNTNANOM UBNTNANOMXW UBNTLOCOXW UBNTUNIFI UBNTUNIFIOUTDOOR UBNTUNIFIOUTDOORPLUS UAPPRO UBNTAIRGW))

WR1045v2 is a local version of WR1043v4 with no differences other than name.

aparcar commented 6 years ago

TVTVTV:

Hello guys, as promised i have just tested both OpenWrt latest snapshot and, as a bonus, LibreCMC, with the exact same results as LEDE - copper ports crash on high-speed upload during speedtest. /etc/init.d/network restart fixes things, else the ports do not come back up even after several minutes. I have grabbed all logs i knew how to take - see files attached.

@Dmitry - The stock firmware is killing me. I'd give an arm and a leg for an OpenWRT CC image that can run on 1043ND v4. Is there any way i could come into possession of the one that you're running? I have no way of compiling my own so i'd be very grateful. Sorry for the short hijack, i can't see any PM system here.

Let me know how i can assist further.

aparcar commented 6 years ago

Dm1:

@Mihnea B.

Since my patch is changing nothing but device name you can just force-flash this [[https://downloads.openwrt.org/chaos_calmer/15.05/ar71xx/generic/openwrt-15.05-ar71xx-generic-tl-wr1043nd-v2-squashfs-factory.bin|openwrt-15.05-ar71xx-generic-tl-wr1043nd-v2-squashfs-factory.bin]] image with "sysupgrade -n -F ..." from any OpenWrt/LEDE you are already using, and tell us the result. For me it's working like a charm.

But if you are completely unfamiliar with recovery technics in case something gone wrong, you better think twice before trying anyway. Because WR1043NDv2 image is fully compatible only with WR1043NDv2, WR1043NDv3, WR1043NDv4 and WR1045NDv2 and in this case you are skipping the compatibility check which will allow you to flash it on any device, even if it's completely different and will be bricked after that.

aparcar commented 6 years ago

TVTVTV:

Thank you Dmitry, i will attempt to force flash the OpenWRT CC image for v2 tonight, if time allows. If it works for me as well then we have a starting point as we draw closer to finding out when was this bug introduced as we will have two point confirmation that OpenWRT CC was working alright. Will report back once done.

P.S. - I can flash the 1043ND to back to stock via TFTP at any time of day and night, i did it so many times already... :(

aparcar commented 6 years ago

TVTVTV:

Hello guys, force flashing the v2-specific OpenWRT CC bricks the router. Recovery is no longer possible via TFTP/LAN port (uboot pulls the recovery image but does not flash it, apparently). The router can only be unbricked via serial. :) So DO NOT try to flash the v2 image on v4 unless you're up for a session of "serial unbricking".

I did all i could, now it's up to the devs.

Have a nice evening!

aparcar commented 6 years ago

r43k3n:

I don't know why any of you even thought this would work. The V2 and V4 have different SoC. They might mi similar but they are still different.

aparcar commented 6 years ago

rotanid:

Adrian, it's because Dmitry keeps telling that he has a TL-WR1043ND v4 although he has a v2-based TL-WR1045ND v2.

on topic: i tried to reproduce the issue on a v4 - but i couldn't. i did an iperf3 with ~900mbit/s via switched network, an iperf3 with ~300mbit/s over routed lan<->wan network and an iperf3 with ~300mbit/s through NAT. i also tested with qos-scripts or sqm-scripts enabled. no crash or switch/network fail occured.

aparcar commented 6 years ago

TVTVTV:

Hello Adrian, rotanid, i got the unit dirt cheap so i thought i'd take the risk of bricking it (in case Dmitry was wrong) since it is useless for me running the stock firmware anyway (unstable PPPoE, frequent disconnects).

On topic: maybe the speedtest applet works by creating multiple concurrent connections and that causes the crash. Maybe that's why iperf runs smoothly, as i gather it only uses one thread.

aparcar commented 6 years ago

Dm1:

Sorry, my bad, I was really wrong about 1043v4, it's using QCA9563, but 1043v2, 1043v3 and 1045v2 are using QCA9558. I missed this even with clear text in the patch above. They still all share the same switch and this bug thought. And this bug is not present in OpenWrt 15.05. Serial recovery is actually quite easy if you are already familiar with TFTP, so I beg you to try it on your device.

aparcar commented 6 years ago

rotanid:

Mihnea, i tried with 8 concurrent connections with iperf3 - cpu load of the router was high, but no problems...

aparcar commented 6 years ago

TVTVTV:

@Dmitry - No problems, i knew very well what the risks were before flashing CC for the v2 (although i was expecting the tftp unbrick method to work like it did before). I have sent the unit in for repairs this morning. The repair will either be free or paid, depending on the judgement of the service center, but i will have the 1043 back in a couple of weeks tops. I don't want to mess with serial/JTAG as i don't have either the necessary HW and experience.

@rotanid - Again, i am not very versed with network products and Linux, but i work in sales/support for a very large company selling Enterprise HW&SW. I have thus become accustomed to look at a problem from all possible angles. I was just thinking how your test is missing one key element: PPPoE & PPPoEv6. My link to the ISP is done via PPPoE and PPPoEv6 and i gather the tests you've made did not involve PPPoE at all. If your connection is over PPPoE and you have at least a 300/150 line, try using the 1043 as your main router, connect a wired client to switch port 1 and do a speedtest run against a local (in country, i mean) server.

I state again that when i got this router i only had a 100 mbit line and it ran fine for a whole month. It's only when i upgraded to 300/150 that the issue became obvious.

One more thing to consider: i have a pretty solid laptop connected to the router via a wireless connection, 300/300 mbit (40 MHz channel). On this laptop i'd get ~150 mbit up and ~140 down on the speedtest site. The router has never crashed when the test was done via wireless; i must have done over 30 runs.

Hope this helps more than it adds confusion into the equation.

P.S. - I will perform the same test with iperf when i get the unit back from the service.

aparcar commented 6 years ago

MPW:

The crashes with my devices were without pppoe. Just l2tp-vpn, typical setup for a guest wifi setup (Freifunk).

aparcar commented 6 years ago

Sven:

I can confirm that in my case (TL-WR841N v9, see very first comment) PPPoE is also NOT used. I use the device as a pure access point, bridging my wired and wireless networks. The services "firewall", "odhcpd" and "dnsmasq" are all disabled.

Although the number of concurrent IP connections shouldn't be a problem here because no connection tracking is needed, I face the same problem.

aparcar commented 6 years ago

WernerSlabon:

I have a lot of tp-wr1043nd in large network running (all version up to v5). On 06.12. all v4 and v5 (the first and only was installed on 11.12.) began to show the described problems. I‘m not aware of whatever has changed on this day - the v4 were running since April/May without any problems (they are monitored with PRTG). Yesterday I added a small ping script (on two devices) which is restarting the network if the ping fails. The log shows that the failure occurs 5-6 times a day, One one device 3-4 times within 1 hour, the other device in period of several hours.

All devices are located in “server“ VLAN, another VLAN is configured for the switch ports and then 2-3 WLAN/VLAN are configured for WiFi (internal employee, guests and youth/children). The devices only have an IP in the server network, all others VLANs have an unmanaged bridge interface or no interface (VLAN on switch only). There‘re operating as pure AccessPoints - no WAN, no DHCP-Server and are DHCP-clients.

They are running on OpenWrt and LEDE (the latter because of v5 support and the current two I‘m observing currently)

Is there anything I can collect? E.g. capture packets in a ring buffer?

aparcar commented 6 years ago

Dm1:

Interesting thing, I was using 1045v2 with LEDE to manage VLAN's too and that's when it started to fail. Maybe this bug is related to tagged port usage only? Can other people in the thread confirm this?

aparcar commented 6 years ago

WernerSlabon:

But at least all v4 devices worked up to 6 months without any problems - until last week and WITHOUT changing the firmware. And ALL started with the problem within 1-2 days ...

aparcar commented 6 years ago

fihufil:

@Alexander Couzens You asked for debug information and here it is: script used for gathering information and script output. If you need any more information I can provide them.

edit: For me the switch works during the outage, when i connect two PCs with static addresses they can ping each other no problem, however wifi <-> lan ping doesn't work wifi <-> wan ping doesn't work wifi <-> router ping works

aparcar commented 6 years ago

Sven:

@Dmitry Chigiryov: I have the same problem on TL-WR841Nv9, but I do neither use PPPoE nor VLANs. So at least in my case, the problem is not related to VLAN tagging.

Unfortunately, my TL-WR841Nv9 are both used as access points and do not provide DHCP services. Futhermore, I'm running a cronjob on them that swiches off WIFI when the default gateway can't be pinged and brings WIFI back up then the default gateway comes back online.

I've now disabled the cron job and added a separate SSID with separate network that has DHCP enabled, so the next time the LAN becomes inaccessible because of this bug, I can connect to the newly created SSID, get an IP via DHCP, SSH to the TL-WR841Nv9 and start collecting logs as requested by @Alexander Couzens.

So hold on... :-)

aparcar commented 6 years ago

rotanid:

@Sven , as you are the only one reporting a problem with WR841Nv9 here, i doubt this is the same issue that all the others have with WR1043NDv4 ...

aparcar commented 6 years ago

Sven:

@rotanid, you might be right, but the behaviour is exactly the same and they share the same platform, so I thought this could be the case.

Anyway, I'll provide debug information as soon as the problem occurs again.

aparcar commented 6 years ago

rotanid:

@Sven, they are using a different switch chip (QCA9533 vs. QCA8337N) and a different CPU (QCA9533 vs. QCA9563) so no, they aren't very similar.

also, the recent question was about VLAN usage and two people so far confirmed they have those issues when using VLANs. your comment about not using VLANs might be misleading, as you aren't using the same device and not the same chips. the topic title is about WR1043, too - so it would be best to open a separate Bug Report for your device.

aparcar commented 6 years ago

Sven:

@rotanid: my bad, seems I was completely wrong. Sorry for the noise.

aparcar commented 6 years ago

WernerSlabon:

@fihufil: I collected the debug.log as you requested.

One important thing I observed: All four APs (3x v4, 1x v5) are configured to ping every 3rd minute a server, collect the debug information (once) and then restart the network. -> All APs stopped their network/Switch at the SAME moment (at least within the same three minutes)

One idea I had (probably you're on the same track) is: Can there be a problem with the MAC table ??? The primary Layer-2-Switch has about 350 MAC addresses in its cache (ON 8 VLANS, whereby the VLANs on the APs make about 2/3rd of all)

The other APs (v1, v2, v3) don't make any problems. Today (before the APs locked out) I replaced a v4 by a v1 device with the equivalent configuration (same VLANs, WLANs) - no problems.

aparcar commented 6 years ago

lucize:

so I have a 1043nd v4, also using pppoe on RDS, as soon I run speed test upload the pppoe session will crash, there will be PADO packet timed out and no more connectivity, sadly there is nothing in dmesg

I tried on all switch ports (every port on it's own vlan) to dial a pppoe connection but after the first crash, until the reboot, the connection will not dial, but if there is also a dhcp (static) wan defined (multiwan), that one will work

aparcar commented 6 years ago

lucize:

can someone try these patches

https://patchwork.ozlabs.org/patch/743498/ (this could be the fix) https://patchwork.ozlabs.org/patch/845962/ https://patchwork.ozlabs.org/patch/852079/

and ar71xx from https://git.lede-project.org/?p=lede/nbd/staging.git;a=shortlog

with these I could run several speed test sessions without crashing, I'll report about stability on the long run

aparcar commented 6 years ago

WernerSlabon:

I want to report some additional information: I’m running a crib job on my (four) ts-wr1043nd v4/v5 every 3 minutes. The jobs are restarting the network (and logging) in case a ping to the server fails.

On 21./22. the APs logged a failure at the same time (within the same 3 min. window - about 6 times in those 2 days. Since 22. nothing happend - but nobody is working ... I‘ll check, when problems come back (the most are starting with work on 9. Jan).

@Lucian: I can try on one AP, if you can provide a firmware file. I don‘t have an environment to compile nor much experience in this area.

aparcar commented 6 years ago

lucize:

@Werner: please use it on 1043nd v4 only ! https://drive.google.com/open?id=1Ml1E6RLOzlLRmhEn3LSl0I4iTbdq5D0d

Regards