EFForg / OpenWireless

The official home of the EFF OpenWireless Project
Other
732 stars 80 forks source link

Content length mismatches bricking router #237

Closed davidstrauss closed 10 years ago

davidstrauss commented 10 years ago

After rebooting the router, I cannot load the login page. If I load /settings.html, a bunch of other resources fail:

selection_005

davidstrauss commented 10 years ago

Firefox doesn't seem to see any length mismatches, but I still get a WSOD on the login page.

davidstrauss commented 10 years ago

Here's the Firefox output. It just thinks it's all zero-length: selection_006

jsha commented 10 years ago

What an odd bug! Were you able to successfully log in before?

If you curl -i one of those URLs, is the Content-Length header present at all?

Do you have SSH access in order to debug? Even if not, I think it should be possible to recover from this state without having to failsafe, either by disabling Content-Length checking in browser (not positive this is possible), or by scripting the site using curl or wget (tedious, I know).

I notice in the Chrome console a CSP violation message. Do you have any extensions that could be trying to insert script?

davidstrauss commented 10 years ago

What an odd bug! Were you able to successfully log in before?

Yes, I was able to go through normal setup after flashing away from the Netgear firmware. After seeing issues detecting the WAN IP, I held down the factory reset (which didn't show any signs of working other than the reboot) and then got where I am now.

If you curl -i one of those URLs, is the Content-Length header present at all?

I wouldn't use curl -i for this sort of diagnostic; it causes it to send a HEAD request, which inherently has no expected body. curl -v did, indeed show a header:

[straussd@zeus ~]$ curl -kv https://gw.home.lan/
* Adding handle: conn: 0xc87940
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0xc87940) send_pipe: 1, recv_pipe: 0
* About to connect() to gw.home.lan port 443 (#0)
*   Trying 172.30.42.1...
* Connected to gw.home.lan (172.30.42.1) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_DHE_RSA_WITH_AES_128_CBC_SHA
* Server certificate:
*   subject: CN=cerowrt,L=Erewhon,ST=California,C=US
*   start date: May 27 17:05:55 2014 GMT
*   expire date: May 12 17:05:55 2074 GMT
*   common name: cerowrt
*   issuer: CN=cerowrt,L=Erewhon,ST=California,C=US
> GET / HTTP/1.1
> User-Agent: curl/7.32.0
> Host: gw.home.lan
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Security-Policy: default-src 'self'; img-src 'self' data:
< X-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< Content-Type: text/html; charset=utf-8
< Accept-Ranges: bytes
< ETag: "53977791"
< Last-Modified: Thu, 31 Jul 2014 01:03:11 GMT
< Content-Length: 1029
< Date: Mon, 25 Aug 2014 19:36:21 GMT
* Server lighttpd/1.4.35 is not blacklisted
< Server: lighttpd/1.4.35
< 
* transfer closed with 1029 bytes remaining to read
* Closing connection 0
curl: (18) transfer closed with 1029 bytes remaining to read

Do you have SSH access in order to debug? Even if not, I think it should be possible to recover from this state without having to failsafe, either by disabling Content-Length checking in browser (not positive this is possible), or by scripting the site using curl or wget (tedious, I know).

I don't have SSH enabled. I tried the failsafe, which worked for getting into telnet and running firstboot. I'm getting the same issue after it comes back up.

I notice in the Chrome console a CSP violation message. Do you have any extensions that could be trying to insert script?

I see the same issue in Incognito (Chrome)/Private Browsing (Firefox) with all extensions/add-ons disabled.

davidstrauss commented 10 years ago

Just checked in with team cURL, and they say the message from the CLI means that (1) the server specified that the client should expect 1029 bytes in the body and (2) the client received zero body bytes before the connection closed.

jsha commented 10 years ago

it causes it to send a HEAD request, which inherently has no expected body.

curl -I (capital i) sends a head, curl -i sends a regular GET and includes the body. -v also works well, of course.

After seeing issues detecting the WAN IP, I held down the factory reset (which didn't show any signs of working other than the reboot)

Unfortunately, the factory reset has to be handled by the firmware, and neither OpenWRT nor the OpenWireless distribution of OpenWRT does that handling. We should document and/or implement that, but the best it could do is restore the OW firmware to its default state. It would not be able to restore the factory firmware - you have to do that from failsafe mode.

Based on the output you pasted, the issue is the opposite of what I thought. Lighttpd is sending the correct Content-Length, but then closing the connection before sending the body. I'm not sure what could cause that. It can't be a permissions issue, or lighttpd would now know the correct Content-Length.

The next thing I would try is, in failsafe mode, log in to the router, start up lighttpd, and curl from localhost to see if you get the same result.

You can also make further debugging easier by, in failsafe mode, copying your SSH pubkey into /etc/dropbear/authorized_keys. After that you should be able to log in in normal mode.

davidstrauss commented 10 years ago

Based on troubleshooting in IRC, the root cause appears to be bad blocks in the flash memory.