SeattleMeshnet / meshbox

The Hyperboria peering device
https://github.com/hyperboria/cjdns
GNU General Public License v3.0
141 stars 25 forks source link

Router stops responding after leaving peers page open for a while. #3

Closed WeirdCarrotMonster closed 10 years ago

WeirdCarrotMonster commented 10 years ago

I've built OpenWRT with meshbox package for my Dlink DIR 320 NRU (B1) and i've noticed following behavior: router works well on any load, until i open cjdns peers page (the one where you see ping responses) and leave it for a minute or two. After that, router hangs.

I'm not sure how to find a reason behind this (since i'm new to openwrt), but i surely can reproduce it.

ghost commented 10 years ago

This device [1] has 8 MB flash (enough) and 32 MB RAM, I wonder whether we're hitting the 32 MB... could you check memory usage before/during/after peers page? logread -f might show something useful as well.

Also, what exactly does hang mean? Does it still switch your LAN traffic, and can you SSH into it? I guess if it hangs completely, only a reboot helps?

[1] http://wiki.openwrt.org/toh/d-link/dir-320_revb1

ghost commented 10 years ago

My idea behind these questions is the following: the peers page requests a list of all peers every 5 secnods, and then requests a switchPing for each of them. The more peers, the more ping requests, and if we're leaking memory somewhere within the LuCI model or the cjdns library, these 32 MB might be filled up quickly.

WeirdCarrotMonster commented 10 years ago

I made some testing and here is the results: First of all, logread -f shows nothing suspicious (wlan0 handshakes, ssh notifications). Second, i launched "while true; do free; sleep 1; done" in separate ssh connection and watched free memory over time: 1) Opening web interface made it go down from about 5000 to 3000 2) Opening peers page made it slowly go down to about 200 until router stopped responding At this point i thought it died, but it seems like i was too impatient previously. After about a minute it started responding again, with following results: 3) Free memory went up to about 7000 4) Opening web interface made it go down to 5000 (and lower) again

After that, i checked peers page and saw that interface is empty (nothing appears after " Querying cjdns admin interface"), hyperboria sites are unavailable and my computer can't ping router via cjdns.

So yeah, seems like it hits memory limit. I think i can build firmware with zRam support (or never check peers list again). Watching memory usage with any other LuCI page open made me think that it leaks memory by itself, but those actions cause a little more leaking.

ghost commented 10 years ago

It might be as simple as triggering GC by hand, or declaring variables more wisely. I'm new to Lua, so there might be quite a few memory leaks in the code.

wfleurant commented 10 years ago

I have not been able to reproduce this error on: LuCI Trunk (svn-r10276) OpenWrt Barrier Breaker r40695. I did a 25 hour test and will continue for another 25. Done on a 4MByte/32Mbyte AP.

Note that this issue has never been seen before.. The router DIR 320 NRU does look interesting. Have you tried running the router with CJDNS disabled? Can you trigger the hang after waiting for 20 min by connecting to the Luci admin?

WeirdCarrotMonster commented 10 years ago

I can try it later today. How can i disable cjdns service? Will it respawn if i just kill it?

ghost commented 10 years ago

Yes it should respawn immediately, with kind-of exponential backoff. To disable it: /etc/init.d/cjdns disable, or use the System -> Startup UI.

wfleurant commented 10 years ago

OK to close this issue? I've never been able to reproduce this.

cjdns_uptime cjdns_peers

WeirdCarrotMonster commented 10 years ago

Did some testing with and without cjdns, then compiled newer OpenWRT version ans it seems like it was some kind of bug in my revision - svn-r10263. Now i have at least 12mb of free memory, so even if minor leak occures, router have enough memory to stay alive. Issue should probably be closed.

wfleurant commented 10 years ago

Hoping you are interested in testing any additional features of the meshbox firmware. if so, then visit the openwrt channel on hypeirc with irc.