Open mStirner opened 1 month ago
[root@open-haus ~]# ps -T -p 28664
PID SPID TTY TIME CMD
28664 28664 ? 00:06:06 node
28664 28668 ? 00:00:00 node
28664 28674 ? 00:00:14 node
28664 28675 ? 00:00:13 node
28664 28676 ? 00:00:13 node
28664 28677 ? 00:00:13 node
28664 28678 ? 00:00:00 node
28664 28682 ? 00:00:05 node
28664 28683 ? 00:00:05 node
28664 28684 ? 00:00:05 node
28664 28685 ? 00:00:05 node
28664 45607 ? 00:01:31 node
28664 45608 ? 00:00:17 node
28664 45609 ? 00:00:13 node
28664 45610 ? 00:00:02 node
28664 45611 ? 00:00:03 node
28664 45612 ? 00:00:02 node
28664 45613 ? 00:00:02 node
28664 45614 ? 00:00:02 node
28664 45615 ? 00:00:02 node
28664 45616 ? 00:00:02 node
28664 45617 ? 00:00:02 node
28664 45618 ? 00:00:02 node
28664 45619 ? 00:00:02 node
28664 45620 ? 00:00:02 node
28664 45621 ? 00:00:02 node
28664 45622 ? 00:00:02 node
28664 45623 ? 00:00:02 node
28664 45624 ? 00:00:02 node
28664 45627 ? 00:00:03 node
28664 45691 ? 00:00:02 node
28664 45873 ? 00:00:02 node
28664 46298 ? 00:00:02 node
28664 46435 ? 00:00:01 node
28664 46438 ? 00:00:01 node
28664 46440 ? 00:00:02 node
28664 46682 ? 00:00:01 node
28664 46968 ? 00:00:01 node
28664 46973 ? 00:00:01 node
28664 47063 ? 00:00:01 node
28664 47066 ? 00:00:01 node
28664 47106 ? 00:00:01 node
28664 47110 ? 00:00:01 node
28664 47112 ? 00:00:01 node
28664 47307 ? 00:00:01 node
28664 47361 ? 00:00:01 node
28664 47364 ? 00:00:01 node
28664 47513 ? 00:00:01 node
28664 47592 ? 00:00:01 node
28664 47595 ? 00:00:01 node
28664 47632 ? 00:00:00 node
28664 47634 ? 00:00:00 node
28664 47637 ? 00:00:00 node
28664 47717 ? 00:00:00 node
28664 47817 ? 00:00:00 node
28664 47819 ? 00:00:00 node
28664 47821 ? 00:00:00 node
28664 47824 ? 00:00:00 node
28664 47826 ? 00:00:00 node
28664 47830 ? 00:00:00 node
28664 47833 ? 00:00:00 node
28664 47921 ? 00:00:00 node
28664 47923 ? 00:00:00 node
28664 47925 ? 00:00:00 node
28664 47927 ? 00:00:00 node
28664 47931 ? 00:00:00 node
28664 47957 ? 00:00:00 node
28664 47962 ? 00:00:00 node
28664 47964 ? 00:00:00 node
28664 47966 ? 00:00:00 node
28664 47968 ? 00:00:00 node
28664 47975 ? 00:00:00 node
28664 47978 ? 00:00:00 node
28664 47980 ? 00:00:00 node
28664 47982 ? 00:00:00 node
28664 47985 ? 00:00:00 node
28664 47988 ? 00:00:00 node
28664 47992 ? 00:00:00 node
28664 48135 ? 00:00:00 node
28664 48137 ? 00:00:00 node
28664 48541 ? 00:00:00 node
[root@open-haus ~]# ps -T -p 28664 | wc -l
81
[root@open-haus ~]# systemctl status connector
● connector.service - OpenHaus SmartHome/IoT connector
Loaded: loaded (/usr/lib/systemd/system/connector.service; enabled; preset: disabled)
Active: active (running) since Wed 2024-10-23 20:50:24 CEST; 2 days ago
Docs: https://open-haus.io
Main PID: 28664 (node)
Tasks: 80 (limit: 11107)
Memory: 754.6M
CPU: 21min 41.849s
CGroup: /system.slice/connector.service
└─28664 /usr/bin/node index.js
Oct 25 23:18:28 open-haus.lan node[28664]: args: [
Oct 25 23:18:28 open-haus.lan node[28664]: {
Oct 25 23:18:28 open-haus.lan node[28664]: _id: '65b2874b98c30b71f603a07a',
Oct 25 23:18:28 open-haus.lan node[28664]: description: 'ESP8266/ESPHome IKEA VINDSTYRKA Particle sensor',
Oct 25 23:18:28 open-haus.lan node[28664]: topic: 'air-sensor/sensor/particulate_matter_25m_concentration/state',
Oct 25 23:18:28 open-haus.lan node[28664]: timestamps: [Object],
Oct 25 23:18:28 open-haus.lan node[28664]: labels: []
Oct 25 23:18:28 open-haus.lan node[28664]: }
Oct 25 23:18:28 open-haus.lan node[28664]: ]
Oct 25 23:18:28 open-haus.lan node[28664]: }
[root@open-haus ~]# systemctl status connector
● connector.service - OpenHaus SmartHome/IoT connector
Loaded: loaded (/usr/lib/systemd/system/connector.service; enabled; preset: disabled)
Active: active (running) since Sat 2024-10-26 12:46:59 CEST; 11h ago
Docs: https://open-haus.io
Main PID: 51700 (node)
Tasks: 60 (limit: 11107)
Memory: 418.3M
CPU: 3min 28.653s
CGroup: /system.slice/connector.service
└─51700 /usr/bin/node index.js
Oct 26 22:42:11 open-haus.lan node[51700]: 'led-strip=ws2801',
Oct 26 22:42:11 open-haus.lan node[51700]: 'magic=8d89fe9e-1f86-480b-bd25-005b8cadaf19'
Oct 26 22:42:11 open-haus.lan node[51700]: ],
Oct 26 22:42:11 open-haus.lan node[51700]: timestamps: { created: 1726259422761, updated: null },
Oct 26 22:42:11 open-haus.lan node[51700]: room: null,
Oct 26 22:42:11 open-haus.lan node[51700]: enabled: true,
Oct 26 22:42:11 open-haus.lan node[51700]: icon: null
Oct 26 22:42:11 open-haus.lan node[51700]: }
Oct 26 22:42:11 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 26 22:42:11 open-haus.lan node[51700]: [2024.10.26 - 22:42.11.565][info][system] [connected] tcp://192.168.2.178:8080
[root@open-haus ~]# ps -T -p 51700 | wc -l
61
May be backend realted?! After restarting backend & disableing some plugins, the ammount of node processes reduced to 25/28/33 processes:
27.10.2024 - 01:58
Plugin sdeaktiviert
- onky/pioneer
- shelly integration
- smartmeter
- "worker thread demo" removed
- me-ac-mbs-1 integration
Bevor sbackend restart: 61 connector worker threads (ps -T -p 51700 | wc -l)
Nach neustart: 25/28/33 connector worker trheads
Perhaps the backed is not signaling correctly a broken/collapsed bridged. But why is this not catched with WebSocket ping/pong (Maybe because its not their purpose, and just apply to "disconnected wires" type of errors, where no ACK etc. pp. package is send)
A simple repl proceess alone starts 15 node processes:
marc@workstation:~$ ps -T aux | grep -i node | wc -l
2
marc@workstation:~$ ps -T aux | grep -i node | wc -l
17
2 before issuing node
, 17 = after
[root@open-haus ~]# systemctl status connector
● connector.service - OpenHaus SmartHome/IoT connector
Loaded: loaded (/usr/lib/systemd/system/connector.service; enabled; preset: disabled)
Active: active (running) since Sat 2024-10-26 12:46:59 CEST; 1 day 11h ago
Docs: https://open-haus.io
Main PID: 51700 (node)
Tasks: 68 (limit: 11107)
Memory: 486.4M
CPU: 11min 12.142s
CGroup: /system.slice/connector.service
└─51700 /usr/bin/node index.js
Oct 27 22:17:01 open-haus.lan node[51700]: 'led-strip=ws2801',
Oct 27 22:17:01 open-haus.lan node[51700]: 'magic=8d89fe9e-1f86-480b-bd25-005b8cadaf19'
Oct 27 22:17:01 open-haus.lan node[51700]: ],
Oct 27 22:17:01 open-haus.lan node[51700]: timestamps: { created: 1726259422761, updated: null },
Oct 27 22:17:01 open-haus.lan node[51700]: room: null,
Oct 27 22:17:01 open-haus.lan node[51700]: enabled: true,
Oct 27 22:17:01 open-haus.lan node[51700]: icon: null
Oct 27 22:17:01 open-haus.lan node[51700]: }
Oct 27 22:17:01 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 27 22:17:02 open-haus.lan node[51700]: [2024.10.27 - 22:17.02.039][info][system] [connected] tcp://192.168.2.178:8080
[root@open-haus ~]# ps -T -p 51700 | wc -l
69
Added some debug messages in the backend to see if the ws is closed properly: In router.api.device.js:
["end", "close"].forEach((event) => {
stream.once(event, () => {
console.log(`>> stream fired "${event}" event in router.api.devices.js`);
ws.close(0, `event "${event}" fired on stream`);
});
});
In class.interface.js .pipe(..., {end: true})
, was added.
Oct 28 15:50:02 open-haus.lan node[61996]: [2024.10.28 - 15:50.02.263][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:51:02 open-haus.lan node[61996]: [2024.10.28 - 15:51.02.438][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:52:02 open-haus.lan node[61996]: [2024.10.28 - 15:52.02.464][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:53:02 open-haus.lan node[61996]: [2024.10.28 - 15:53.02.495][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:54:02 open-haus.lan node[61996]: [2024.10.28 - 15:54.02.521][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:55:02 open-haus.lan node[61996]: [2024.10.28 - 15:55.02.533][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:55:56 open-haus.lan node[61996]: [2024.10.28 - 15:55.56.242][warn][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] WebSocket is dead, reconnect ws://192.168.2.178:8080/
Oct 28 15:55:56 open-haus.lan node[61996]: [2024.10.28 - 15:55.56.243][debug][devices] Bridge closed: iface 66e4a0de72e58e7085f2a2dc <-> tcp://192.168.2.178:8080 (b6c74864-9c1b-4cf3-88cf-f3110356e5a5)
Oct 28 15:55:56 open-haus.lan node[61996]: [2024.10.28 - 15:55.56.244][error][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] WebSocket closed from "ws://192.168.2.178:8080/"
Oct 28 15:56:01 open-haus.lan node[61996]: [2024.10.28 - 15:56.01.346][debug][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] Try to connecto to WebSocket "ws://192.168.2.178:8080/"
Oct 28 15:56:01 open-haus.lan node[61996]: [2024.10.28 - 15:56.01.432][debug][http] 127.0.0.1 - [GET] /api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 15:56:01 open-haus.lan node[61996]: [2024.10.28 - 15:56.01.436][debug][devices] Bridge created for interface "66e4a0de72e58e7085f2a2dc" (76a6bf95-f503-4d1b-91e2-461da9ffc420)
Oct 28 15:56:02 open-haus.lan node[61996]: [2024.10.28 - 15:56.02.604][debug][http] 192.168.2.195 - [POST] /api/endpoints/66fe7afdcb05fb46b4c69502/states/66fe7afdcb05fb46b4c69503
Oct 28 15:56:31 open-haus.lan node[61996]: [2024.10.28 - 15:56.31.346][debug][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] Could not create WebSocket connection in 30s to 192.168.2.178:8080; Redo...
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.448][debug][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] Try to connecto to WebSocket "ws://192.168.2.178:8080/"
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.535][debug][http] 127.0.0.1 - [GET] /api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.540][debug][devices] Bridge created for interface "66e4a0de72e58e7085f2a2dc" (a9f009be-16c4-4c71-959a-555e9830e910)
Oct 28 15:56:36 open-haus.lan node[61996]: [2024.10.28 - 15:56.36.676][info][plugins/32ae791a-a073-4c2c-a2e9-1a9ca0d572ed] WebSocket connected to: ws://192.168.2.178:8080/
"Websocket is dead": But no debug message ">> stream fired "end" event in router.api.devices.js" is printed.
Is the websocket between backend & connector keept alive and just a new bridged worker spawenen in the connector?
Before this message, there where 30 tasks shown systemctl status connector
, now 31...
.178 is the lowboard custom LED Strip. Due to a cheap PSU the Pi is undervoltaged and random restarts.
Oct 28 16:10:07 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 16:10:07 open-haus.lan node[51700]: [2024.10.28 - 16:10.07.746][info][system] [connected] tcp://192.168.2.178:8080
...
Oct 28 16:13:18 open-haus.lan node[51700]: Message from backend for bidrigin interface {"iface":"66e4a0de72e58e7085f2a2dc","type":"request","uuid":"3ddf6291-64f8-4139-b705-824dbf31048f","socket":true}
Oct 28 16:13:18 open-haus.lan node[51700]: bridge request
...
Oct 28 16:13:18 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
...
Oct 28 16:15:38 open-haus.lan node[51700]: Message from backend for bidrigin interface {"iface":"66e4a0de72e58e7085f2a2dc","type":"request","uuid":"502a72d9-4102-4fa2-bdf9-c10a0e538c5f","socket":true}
Oct 28 16:15:38 open-haus.lan node[51700]: bridge request
...
Oct 28 16:15:38 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 16:15:38 open-haus.lan node[51700]: [2024.10.28 - 16:15.38.544][info][system] [connected] tcp://192.168.2.178:8080
...
Oct 28 16:18:59 open-haus.lan node[51700]: Message from backend for bidrigin interface {"iface":"66e4a0de72e58e7085f2a2dc","type":"request","uuid":"e3c3b57c-c022-421c-876f-b3b538acc1ae","socket":true}
Oct 28 16:18:59 open-haus.lan node[51700]: bridge request
...
Oct 28 16:18:59 open-haus.lan node[51700]: Worker spawend for url http://127.0.0.1:8080/api/devices/66e4a0de72e58e7085f2a2db/interfaces/66e4a0de72e58e7085f2a2dc
Oct 28 16:18:59 open-haus.lan node[51700]: [2024.10.28 - 16:18.59.225][info][system] [connected] tcp://192.168.2.178:8080
Turning on/off the raspberry, leads to broken websocket connection. So far so clear. But it seems the underlaying bridiging websocket is not closed properley which leads to spawning worker over and over again when a retry is attempt to establish a ws connection from the plugin to the pi.
Is this a issue in the plugin, backend or connector. Or a combination of all of them?
The backend/connector should handle that edge case. When a bridge is no longer need it should be teared down.
This could be a wider problem that thought. When a attempt is made do a non existing tcp socket/server, this error raises.
I think the backend should handle this and terminate the websocket connection between connector & backend, when no data is transmitted for a certain time.
Local "production" system produces OOM, killing the connector:
valgrind
heaptrack