Ernillew / wl500g

Automatically exported from code.google.com/p/wl500g
0 stars 0 forks source link

RT-N16: random memory corruption when wireless radio is turned off #186

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
* What steps will reproduce the problem?

1. Set wireless radio to "off" in the configuration panel
2. Compile memtester program (http://pyropus.ca/software/memtester/) and put it 
on usb flash
3. Boot the device, login via telnet and run memtester so that it tests as much 
of the available memory as possible

* What is the expected output? What do you see instead?

Memory test should pass, but it fails, see the log at the end of this post.

* What version of the product are you using?

Tried both RT-N16-1.9.2.7-rtn-r2274.trx and the original ASUS firmware 
FW_RT_N16_1019.trx (version 1.0.1.9)

Actually right now I'm not totally sure if it is a kernel bug somewhere in wlan 
driver, or maybe just my device really has defective RAM. So an independent 
test would be very much welcome.

* Please provide any additional information below.

# /tmp/harddisk/part0/memtester 93 
memtester version 4.1.2 (32-bit)
Copyright (C) 2009 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 93MB (97517568 bytes)
got  93MB (97517568 bytes), trying mlock ...locked.
Loop 1:
  Stuck Address       : ok         
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok         
  Block Sequential    : ok         
  Checkerboard        : ok         
  Bit Spread          : testing  40FAILURE: 0x00800000 != 0x02800000 at offset 0x003a9812.
  Bit Flip            : testing 206FAILURE: 0x02000000 != 0x00000000 at offset 0x0039c201.
  Walking Ones        : ok         
  Walking Zeroes      : ok

Original issue reported on code.google.com by Siarhei....@gmail.com on 11 Dec 2010 at 10:39

GoogleCodeExporter commented 9 years ago
It is not FW bug. Probably WiFi driver allow writes to unlocked regions, but 
this theory needs deep reversing since where is no sources for it.

Original comment by lly.dev on 12 Dec 2010 at 6:26

GoogleCodeExporter commented 9 years ago
Sorry for not silently walking away, but could you please:
1. First confirm whether this issue is reproducible for you or not.
2. If it is reproducible, then it probably makes sense to
a) Simply workaround it by disabling "wifi radio off" option in *your* FW for 
now
b) Somehow communicate the problem to ASUS in the hope that it might be fixed 
(I can do it myself, but first I need a confirmation)

I wasted a good half a day trying to figure out what is wrong and why the 
system is unstable. Would it not be nice to prevent the other users from having 
this experience? And in the case if it is not reproducible for anybody else, 
then probably I just need to return it back to the shop as defective.

Thanks.

Original comment by Siarhei....@gmail.com on 12 Dec 2010 at 11:50

Attachments:

GoogleCodeExporter commented 9 years ago
I have attached the memtester program binary in the previous comment for the 
convenience of those who might want to try to reproduce the problem. It's just 
an unmodified memtester 4.1.2 compiled with mipsel-unknown-linux-uclibc 
toolchain.

Original comment by Siarhei....@gmail.com on 12 Dec 2010 at 11:56

GoogleCodeExporter commented 9 years ago
Once more - there is no sources of Broadcom WiFi driver. You free to write 
complain to Broadcom yourself.

Original comment by lly.dev on 13 Dec 2010 at 7:44

GoogleCodeExporter commented 9 years ago
Leonid, can you reproduce it?

Original comment by v...@orient-96.ru on 13 Dec 2010 at 8:04

GoogleCodeExporter commented 9 years ago
themiron:
1) first of all, memtester can give errors on sizes > 1/2 RAM (64M)
2) and is it reproducible for you?

Original comment by lly.dev on 13 Dec 2010 at 8:58

GoogleCodeExporter commented 9 years ago
OK, now I'm about 95% sure that "wireless radio turned off" is the real cause. 
I had the system compiling various packages (including rebuilding gcc itself) 
during the last day, and everything worked great without a single fail. 
Occasional runs of memtester (now checking 110MB, that was possible due to 
enabled swap) also did not reveal any problems. With a confirmation from 
somebody else, the confidence level will just rise up to 100% :)

I really don't know what to do with you guys. I primarily came here because of 
the positive feedback about your firmware found on the Internet, and it indeed 
seems to suck a bit less than the ASUS one. So thanks anyway, you made my life 
a little bit easier. If you don't want to be notified about the bugs and 
problems, then so be it, it's your choice and I respect it.

> Once more - there is no sources of Broadcom WiFi driver.

Just to make it clear. I'm not really interested if a working WiFi on this 
device (I have another WiFi access point which works fine), so I have little 
need for the Broadcom driver. I'm just a bit surprised that it is even loaded 
(is it?) when WiFi is supposed to be turned off. That's a waste of RAM, and the 
source of potential bugs.

As I mentioned before, I don't mind filing a bug to ASUS myself, as they are a 
direct upstream and it's up to them to bother Broadcom next. I just need a 
confirmation of the issue from somebody else to be absolutely sure. And if 
nobody else cares, then this is not moving anywhere. Sorry.

On a positive side, Broadcom seems to be opening some stuff, you guys probably 
have already heard about this anyway. Don't known how it affects this RT-N16 
device though, but it would be nice if any improvements are coming: 
http://thread.gmane.org/gmane.linux.kernel.wireless.general/55418

> 1) first of all, memtester can give errors on sizes > 1/2 RAM (64M)

This sounds like a really serious kernel bug. If you have some clear 
description of the steps needed to reproduce it, please let me know. You seem 
to be using the following patch already:
http://code.google.com/p/wl500g/source/browse/branches/rt-n/kernel-2.6/050-mvist
a-mem.patch
Are there some other known memory related stability problems that I'm not aware 
yet?

Original comment by Siarhei....@gmail.com on 14 Dec 2010 at 10:03

GoogleCodeExporter commented 9 years ago
Neither me, nor theMIROn can't reproduce your problem.

If you don't want to use Broadcom driver at all, you have to use OpenWRT.

If you read wl500g.info & OpenWRT forums, you will discover that Broadcom 
didn't open sources for embedded WiFi cards used in RT-N16, for example.

Anyway, we are simply unable to provide personal support.

Original comment by lly.dev on 15 Dec 2010 at 8:44

GoogleCodeExporter commented 9 years ago
> Neither me, nor theMIROn can't reproduce your problem.

Thanks for your feedback. That's really interesting. Could it be some 
difference in the bootloader then? If I understand it correctly, you never try 
to replace CFE.

> If you don't want to use Broadcom driver at all, you have to use OpenWRT.

Please don't jump to any baseless conclusions. I definitely don't have to use 
OpenWRT or anything else. In the end I'm only interested in just a properly 
working and reliable kernel for this device without any kind of userspace 
stuff. But relax, you don't have to worry about this, and I'm actually not 
interested in your opinion on this matter either.

> Anyway, we are simply unable to provide personal support.

Well, just don't waste your time and don't bother replying then. Seriously.

Original comment by Siarhei....@gmail.com on 15 Dec 2010 at 9:38

GoogleCodeExporter commented 9 years ago
To sum it up, the facts are:

1. There is an easily reproducible problem involving the use of your FW in one 
particular configuration on my RT-N16 device. If this problem gets triggered, 
the consequences are quite serious as the stability goes south. As a side note, 
I don't need any kind of help or support just because I have already narrowed 
it down (though without investigating it further yet) and workarounded this 
problem even before reporting this issue here. So there is no need to patronize 
me or give any kind of otherwise useless advices that are not directly related 
to the issue.

2. The project maintainers are pissed off by Broadcom and clearly are not going 
to waste any time investigating the problem. Hence the immediate change of bug 
status to 'invalid'.

3. Two other persons tried to reproduce the problem with no luck. My guess is 
that it might have been caused by:
a) some differences in CFE
b) some differences in WLAN neighborhood (I have lots of other wireless access 
points around)
c) some differences in the hardware (my device was bought less than 1 week ago 
for what it matters)
d) the persons who tried to reproduce the bug did not try hard enough

One last remaining question is. If the bug ever gets confirmed, are you going 
to reject any patches/workarounds? I understand that it may be a political 
issue and an important act of protest against Broadcom.

Original comment by Siarhei....@gmail.com on 15 Dec 2010 at 10:29

GoogleCodeExporter commented 9 years ago
We have to use Broadcom SDK, and Broadcom rejects any requests from all who not 
own it. In case of you don't agree with this situation, the only thing you can 
do - not to use SDK. Period.

In case of you(or someone else) will fix something in Broadcom SDK, patches(not 
dirty hacks) will be accepted. I really not sure that someone will do workable 
binary patch against wl.ko WiFi driver.

Original comment by lly.dev on 15 Dec 2010 at 10:57