Closed jukkar closed 10 years ago
well that's a new one. what does the steward output say?
@mrose17
The log is below. The log are from image that I am running in the VirtualBox. I also tried running TTS in Intel NUC device and I currently do not see the same error there, the results are quite confusing. I am doing more tests for this.
The log from steward looks pretty normal to my eyes:
debug: [steward] begin
debug: [devices] begin
debug: [server] begin
*** WARNING *** The program 'node' uses the Apple Bonjour compatibility layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=node>
*** WARNING *** The program 'node' called 'DNSServiceRegister()' which is not supported (or only supported partially) in the Apple Bonjour compatibility layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=node&f=DNSServiceRegister>
debug: [discovery] begin
debug: [manage] begin
info: [steward] scanning enp0s3
info: [steward] scanning tether
info: [steward] loading place actor
warning: [steward] place/1 event=read .git/logs/HEAD, diagnostic=ENOENT, open '/opt/TheThingSystem/steward/actors/../../.git/logs/HEAD'
info: [devices] loading climate driver
debug: [climate] begin
info: [devices] loading gateway driver
debug: [gateway] begin
info: [devices] loading indicator driver
debug: [indicator] begin
info: [devices] loading lighting driver
debug: [lighting] begin
info: [devices] loading media driver
debug: [media] begin
info: [devices] loading motive driver
debug: [motive] begin
info: [devices] loading presence driver
debug: [presence] begin
info: [devices] loading sensor driver
debug: [sensor] begin
info: [devices] loading switch driver
debug: [switch] begin
info: [devices] loading wearable driver
debug: [wearable] begin
info: [climate] loading ecobee-control driver
info: [climate] loading flower-power-sensor driver
info: [climate] loading koubachi-sensor driver
info: [climate] loading nest-control driver
info: [climate] loading netatmo-sensor driver
info: [climate] loading oregon-scientific-sensor driver
info: [climate] loading samsung-control driver
debug: [discovery] listening network_interface=enp0s3, ipaddr=10.237.67.149, portno=1900
debug: [discovery] listening network_interface=tether, ipaddr=192.168.0.1, portno=1900
info: [climate] loading yoctopuce-meteo driver
info: [discovery] SSDP listening on http://0.0.0.0:1900
debug: [discovery] multicasting network_interface=enp0s3, ipaddr=10.255.255.255, portno=1900
info: [discovery] SSDP listening on http://0.0.0.0:1900
debug: [discovery] multicasting network_interface=tether, ipaddr=192.168.0.255, portno=1900
info: [gateway] loading automatic-cloud gateway
info: [gateway] loading ecobee-cloud gateway
info: [devices] failing ecobee-cloud gateway (continuing) diagnostic=Cannot find module 'ecobee-api'
info: [gateway] loading flower-power-cloud gateway
info: [gateway] loading insteon-9761 gateway
info: [gateway] loading koubachi-cloud gateway
info: [gateway] loading lockitron-cloud gateway
info: [gateway] loading nest-cloud gateway
info: [gateway] loading netatmo-cloud gateway
info: [gateway] loading openzwave-usb gateway
info: [gateway] loading reelyactive-reel gateway
info: [gateway] loading rfxcom-usb gateway
info: [gateway] loading tesla-cloud gateway
info: [gateway] loading wink-cloud gateway
info: [gateway] loading yoctopuce-hub gateway
info: [gateway] loading zigbee-gmo gateway
info: [discovery] reelyactive-reel driver listening on udp://*:7018
info: [indicator] loading cassandra-nosql driver
info: [indicator] loading dweetio-sensor driver
info: [indicator] loading grovestreams-sensor driver
info: [indicator] loading mqtt-text driver
info: [indicator] loading nma-text driver
info: [indicator] loading prowl-text driver
info: [indicator] loading wink-dial driver
info: [indicator] loading wink-nimbus driver
info: [indicator] loading xively-sensor driver
info: [lighting] loading blink1-led driver
info: [lighting] loading heroicrobotics driver
info: [lighting] loading hue driver
info: [lighting] loading insteon-led driver
info: [lighting] loading lifx-bulb driver
info: [lighting] loading robosmart-led driver
Mon, 07 Apr 2014 09:44:19 GMT hci-ble hciBle = /opt/TheThingSystem/steward/node_modules/noble/lib/linux/../../build/Release/hci-ble
info: [lighting] loading tabu-lumen driver
info: [lighting] loading tcpi-led driver
*** WARNING *** The program 'node' called 'DNSServiceRegister()' which is not supported (or only supported partially) in the Apple Bonjour compatibility layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=node&f=DNSServiceRegister>
info: [lighting] loading yoctopuce-color driver
info: [lighting] loading yoctopuce-powercolor driver
info: [discovery] PixelPusher listening on udp://*:7331
info: [media] loading audio-sonos driver
info: [media] loading video-appletv driver
info: [media] loading video-chromecast driver
info: [media] loading video-roku driver
info: [motive] loading automatic-vehicle driver
info: [motive] loading lockitron-lock driver
info: [motive] loading tesla-model-s driver
info: [presence] loading fob driver
info: [presence] loading mobile-mqtt driver
info: [presence] loading reelyactive-tag driver
info: [switch] loading insteon-dimmer driver
info: [switch] loading insteon-onoff driver
info: [switch] loading wemo-onoff driver
info: [switch] loading wink-onoff driver
info: [switch] loading wink-pivot-power-genius driver
info: [switch] loading zwave-dimmer driver
info: [switch] loading zwave-onoff driver
info: [wearable] loading watch driver
debug: [discovery] multicasting network_interface=enp0s3, ipaddr=10.255.255.255, portno=1900
debug: [discovery] multicasting network_interface=tether, ipaddr=192.168.0.255, portno=1900
Mon, 07 Apr 2014 09:45:53 GMT hci-ble buffer = "adapterState poweredOn\n"
Mon, 07 Apr 2014 09:45:53 GMT hci-ble line = adapterState poweredOn
Mon, 07 Apr 2014 09:45:53 GMT hci-ble adapterState = poweredOn
debug: [discovery] SSDP response address=192.168.0.1, family=IPv4, port=1900, size=189
debug: [discovery] SSDP response address=192.168.0.1, family=IPv4, port=1900, size=189
debug: [discovery] multicasting network_interface=enp0s3, ipaddr=10.255.255.255, portno=1900
debug: [discovery] multicasting network_interface=tether, ipaddr=192.168.0.255, portno=1900
info: [sensor] loading nest-smoke driver
info: [sensor] loading texas-instruments-sensortag driver
./start-steward.debug: line 21: 584 Killed DEBUG=hci-ble node $DIR/server.js $@
For some reason the steward went crazy as I see this in dmesg:
[ 262.141277] device wlan0 entered promiscuous mode
[ 262.229413] tether: port 1(wlan0) entered forwarding state
[ 262.229625] tether: port 1(wlan0) entered forwarding state
[ 324.378180] device enp0s3 entered promiscuous mode
[ 324.420286] device tether entered promiscuous mode
[ 453.426491] systemd-journal invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[ 453.427260] systemd-journal cpuset=/ mems_allowed=0
[ 453.555625] CPU: 0 PID: 255 Comm: systemd-journal Tainted: G W 3.14.0-rc2-bt6lowpan #2
[ 453.557062] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 453.558242] 00000000 00000000 c71f7ca4 c180e9a8 c4e7c140 c71f7d04 c180b918 c1a53900
[ 453.560409] c4e7c500 000201da 00000000 00000000 00000206 c1b70160 c71f7cd0 c108e1eb
[ 453.562172] c71f7ce0 c18166e1 c1b70160 00000001 c71f7d04 c1380d9a c79f2b80 000201da
[ 453.563807] Call Trace:
[ 453.564221] [<c180e9a8>] dump_stack+0x4b/0x75
[ 453.564650] [<c180b918>] dump_header.isra.9+0x76/0x269
[ 453.565155] [<c108e1eb>] ? trace_hardirqs_on+0xb/0x10
[ 453.565537] [<c18166e1>] ? _raw_spin_unlock_irqrestore+0x41/0x70
[ 453.565996] [<c1380d9a>] ? ___ratelimit+0x8a/0x110
[ 453.566409] [<c110c132>] oom_kill_process+0x1c2/0x380
[ 453.566788] [<c104d6b4>] ? has_capability_noaudit+0x14/0x20
[ 453.567169] [<c110c753>] out_of_memory+0x2c3/0x310
[ 453.567575] [<c11106ab>] __alloc_pages_nodemask+0x94b/0x9b0
[ 453.567962] [<c110a7c9>] filemap_fault+0x2d9/0x510
[ 453.568373] [<c112ab7a>] ? __do_fault+0x5a/0x4c0
[ 453.568805] [<c112ab7a>] __do_fault+0x5a/0x4c0
[ 453.569120] [<c11479d0>] ? sock_update_memcg+0x220/0x220
[ 453.569499] [<c112de66>] handle_mm_fault+0x156/0x8f0
[ 453.569870] [<c181a280>] ? __do_page_fault+0x110/0x570
[ 453.570105] [<c181a2dc>] __do_page_fault+0x16c/0x570
[ 453.570466] [<c116dc2a>] ? mntput_no_expire+0x6a/0x220
[ 453.570848] [<c1152b91>] ? __fput+0x121/0x200
[ 453.571352] [<c1817451>] ? restore_all+0xf/0xf
[ 453.571714] [<c1061a97>] ? task_work_run+0x97/0xb0
[ 453.572085] [<c181a6e0>] ? __do_page_fault+0x570/0x570
[ 453.572529] [<c181a6ed>] do_page_fault+0xd/0x10
[ 453.572896] [<c1817beb>] error_code+0x5f/0x64
[ 453.574092] [<c1810000>] ? e1000_regdump+0x34b/0x381
[ 453.574482] [<c181a6e0>] ? __do_page_fault+0x570/0x570
[ 453.706181] Mem-Info:
[ 453.706529] DMA per-cpu:
[ 453.706862] CPU 0: hi: 0, btch: 1 usd: 0
[ 453.791091] Normal per-cpu:
[ 453.791432] CPU 0: hi: 18, btch: 3 usd: 0
[ 453.791804] active_anon:14943 inactive_anon:797 isolated_anon:0
[ 453.791804] active_file:36 inactive_file:124 isolated_file:0
[ 453.791804] unevictable:0 dirty:0 writeback:0 unstable:0
[ 453.791804] free:413 slab_reclaimable:2505 slab_unreclaimable:2160
[ 453.791804] mapped:5563 shmem:1092 pagetables:197 bounce:0
[ 453.791804] free_cma:0
[ 454.081175] DMA free:548kB min:192kB low:240kB high:288kB active_anon:11368kB inactive_anon:496kB active_file:104kB inactive_file:288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15916kB mlocked:0kB dirty:0kB writeback:0kB mapped:552kB shmem:700kB slab_reclaimable:572kB slab_unreclaimable:1068kB kernel_stack:88kB pagetables:136kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1155 all_unreclaimable? yes
[ 454.308911] lowmem_reserve[]: 0 89 89 89
[ 454.328974] Normal free:1108kB min:1112kB low:1388kB high:1668kB active_anon:48404kB inactive_anon:2692kB active_file:48kB inactive_file:208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:114624kB managed:93500kB mlocked:0kB dirty:0kB writeback:0kB mapped:21692kB shmem:3668kB slab_reclaimable:9448kB slab_unreclaimable:7572kB kernel_stack:672kB pagetables:652kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1325 all_unreclaimable? yes
[ 454.605629] lowmem_reserve[]: 0 0 0 0
[ 454.649197] DMA: 37*4kB (EMR) 2*8kB (R) 22*16kB (R) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 548kB
[ 454.835953] Normal: 277*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1108kB
[ 455.083528] 1258 total pagecache pages
[ 455.083870] 0 pages in swap cache
[ 455.171160] Swap cache stats: add 0, delete 0, find 0/0
[ 455.171645] Free swap = 0kB
[ 455.171969] Total swap = 0kB
[ 455.277845] 32654 pages RAM
[ 455.309081] 0 pages HighMem/MovableOnly
[ 455.309445] 5281 pages reserved
[ 455.391165] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 455.391847] [ 255] 0 255 2735 664 7 0 0 systemd-journal
[ 455.477368] [ 278] 0 278 2301 84 6 0 -1000 systemd-udevd
[ 455.562152] [ 314] 0 314 622 13 5 0 0 syslogd
[ 455.650065] [ 316] 0 316 1371 55 6 0 0 connman-vpnd
[ 455.650548] [ 318] 0 318 1301 81 5 0 0 ofonod
[ 455.837121] [ 320] 0 320 606 15 5 0 0 klogd
[ 455.837833] [ 322] 0 322 763 37 5 0 0 start-eca-web
[ 455.928057] [ 327] 0 327 24330 1370 16 0 0 python
[ 455.928731] [ 334] 998 334 730 78 5 0 -900 dbus-daemon
[ 456.035061] [ 364] 0 364 736 45 5 0 0 systemd-logind
[ 456.035743] [ 366] 0 366 1511 70 6 0 0 connmand
[ 456.207744] [ 367] 999 367 819 54 5 0 0 avahi-daemon
[ 456.340193] [ 368] 999 368 788 38 4 0 0 avahi-daemon
[ 456.340823] [ 369] 0 369 1066 47 5 0 0 bluetoothd
[ 456.426548] [ 379] 0 379 1336 68 6 0 0 wpa_supplicant
[ 456.617146] [ 387] 0 387 789 69 5 0 0 sh
[ 456.617811] [ 581] 0 581 726 32 5 0 0 start-steward.d
[ 456.794684] [ 582] 0 582 607 15 5 0 0 tee
[ 456.836164] [ 584] 0 584 36160 17006 67 0 0 node
[ 456.924057] [ 591] 0 591 537 21 5 0 0 hci-ble
[ 456.924673] [ 609] 0 609 600 21 5 0 0 systemd-cgroups
[ 457.026127] [ 610] 0 610 600 22 5 0 0 systemd-cgroups
[ 457.026753] [ 611] 0 611 1185 258 4 0 0 (agetty)
[ 457.144358] Out of memory: Kill process 584 (node) score 605 or sacrifice child
[ 457.208414] Killed process 591 (hci-ble) total-vm:2148kB, anon-rss:84kB, file-rss:0kB
I was too optimistic. After running fine for some time in Intel NUC, steward started to behave badly there also. OOM-killer kills steward, it is then restarted automatically by systemd but because of this restart cycle, the web UI is pretty much unusable.
Interesting, so I'm guessing here, but I'm presuming that one of the network interfaces really doesn't like dropping into promiscuous mode,
[ 262.141277] device wlan0 entered promiscuous mode
[ 262.229413] tether: port 1(wlan0) entered forwarding state
[ 262.229625] tether: port 1(wlan0) entered forwarding state
[ 324.378180] device enp0s3 entered promiscuous mode
[ 324.420286] device tether entered promiscuous mode
what can you tell us about enp0s3 and tether ..?
enp0s3 is normal ethernet interface tether is bridged wlan interface
Okay, hmm… a bridged wlan interface. A lot of wireless adaptors don't support promiscuous mode, it's possible that, since it's a bridged interface the kernel isn't spotting it, trying to throw it into promiscuous mode, and getting trapped into a 100% CPU utilisation loop..?
Can you try throwing it into promiscuous mode manually form the command line and seeing if you get the same result?
The wlan adapter (Intel Centrino Advanced-N 6235) I am using supports promiscuous mode just fine.
I tried with two ethernet interfaces (one pci based and the other usb-ethernet dongle) and disabled the wlan and I see the same results i.e., steward process uses 100% cpu and the memory consumption increases all the time.
Okay, if that's ruled out, the next thing to do then is to look at the fact it's killing BLE related processes. Could you go to steward/steward/discovery/discovery-ble.js and disable the module by adding the following two lines right at the top,
exports.start = function() {};
return;
and restart the steward.
I think the hci-ble is killed also because the parent process (steward) was killed first.
Anyway, editing discovery-ble.js causes this error
alert: [steward] exception diagnostic=Object #<Object> has no method 'register'
alert: [steward] exception stack=[{"fileName":"/opt/TheThingSystem/steward/devices/devices-lighting/lighting-robosmart-led.js","lineNumber":202,"functionName":"Object.exports.start","typeName":"Object","methodName":"exports.start","columnNumber":46,"native":false},{"fileName":"/opt/TheThingSystem/steward/core/utility.js","lineNumber":170,"functionName":null,"typeName":null,"methodName":null,"columnNumber":41,"native":false},{"fileName":"fs.js","lineNumber":107,"functionName":"Object.oncomplete","typeName":"Object","methodName":"oncomplete","columnNumber":15,"native":false}]
uncaught exception: TypeError: Object #<Object> has no method 'register'
I am not sure how useful this log is but I straced steward and the log can be found here https://docs.google.com/file/d/0B92N-GBvquUaSUV6eTZ5Y3NPVFk/edit?pli=1
Steward is doing something crazy as there is constant flow of these lines in the log
poll([{fd=37, events=POLLIN}], 1, 0) = 0 (Timeout)
epoll_wait(5, {{EPOLLIN, {u32=36, u64=36}}}, 1024, 4) = 1
@jukkar - thanks. could i test your patience a bit? what i would like to understand is whether the issue is related to a particular interface or a combination of interfaces.
would it be possible for you to run a few tests to narrow this down? i.e., run it with tether only, then enps03 only. and see if it gets killed or not. thanks!
finally: i forgot to mention earlier, the fix to https://github.com/TheThingSystem/steward/issues/185 occurs when the steward starts, once it starts listening and "doing things" any code associated with that fix is no longer running...
@mrose17 - Some test results below
Host: Intel NUC device OS: Linux 3.10.19 TTS: 1.6+ (commit id bbc9e0c9ae) node.js: 0.10.26
No network interfaces available when steward starts:
One interface up and having IP address:
Two interfaces up and having IP address:
Three interface up and having IP address:
Four interface up and having IP address:
It seems that having multiple IP addresses in the device is the triggering condition. Multiple network interfaces without IP address did not trigger the memory issue.
Just by accident I used TTS v1.5 in Intel NUC and still got the same memory issue. For some reason I did not see the "two interface up" issue fixed by #185 this time in Intel NUC hw. So the fix to #185 is clearly not causing this issue described in this bug, just as you suggested in your previous comment.
thanks very much for running the tests.
tomorrow evening (thursday) i'll run some tests here on a system with two interfaces active, perhaps i can see some pattern going on…
thanks for your patience. hopefully we can get to the bottom of this very soon!
ps: i'm not sure about the steward waiting for an interface to come up before starting, but that's another issue. (-;
well, i can reproduce it, which is good news. so now i'm going to try to figure out where it's going sideways...
got it. had to move to the current pcap module (which i've wanted to do anyway for a while). i'm running with three interfaces and seeing < 1.2% CPU on average
to get the fix you need to "get the latest", viz.,
cd steward
git pull
cd steward
rm -rf node_modules
npm -l install
and then restart the steward. thanks for reporting this and for your patience!
I have this system with multiple network interfaces. If those interfaces have an IP address when steward is started, then the steward starts to consume 100% CPU and eventually it runs out of memory and is killed by the OS. If there is only one interface with IP address when steward starts, I see no problems.
Perhaps this is a side effect of the fix to this bug https://github.com/TheThingSystem/steward/issues/185
My steward is at this commit id bbc9e0c9aec8f3087e67f2093f7884340644e4c1