Achronite / energenie-ener314rt

Node module for RaspberryPi energenie ener314-rt board, used by node-red-contrib-energenie-ener314rt
MIT License
4 stars 2 forks source link

Crash when adding new devices #15

Closed shaw-iot closed 3 years ago

shaw-iot commented 3 years ago

After running for a few hours, adding new devices will cause the error message "Something went wrong discovering OpenThings devices" to appear. Node-RED will then restart and devices can be added as normal. Running with Node-RED v1.2.9 and node-red-contrib-energenie-ener314rt v0.3.9. Running on a Pi2 with 1GB RAM, plenty free at the time of the crashes.

Feb 17 18:38:31 ntp0 systemd[1]: nodered.service: Main process exited, code=killed, status=11/SEGV Feb 17 18:38:31 ntp0 systemd[1]: nodered.service: Failed with result 'signal'. Feb 17 18:38:51 ntp0 systemd[1]: nodered.service: Service RestartSec=20s expired, scheduling restart. Feb 17 18:38:51 ntp0 systemd[1]: nodered.service: Scheduled restart job, restart counter is at 1. Feb 17 18:38:51 ntp0 systemd[1]: Stopped Node-RED graphical event wiring tool. Feb 17 18:38:51 ntp0 systemd[1]: Started Node-RED graphical event wiring tool.

Achronite commented 3 years ago

What triggered the crash? Was it pressing the 'learn' button?

When you were adding the new devices can you tell me what state they were in (are they in discover mode?), and did you have any other 'receive' devices already added at the time?

shaw-iot commented 3 years ago

Hello, it actually happens before the learn stage. Add a new node, or select an existing node, Double click the node to show it's configuration. From the drop down list of devices, select "add new OpenThings-device" and then click the pencil to edit (add). Once it's crashed and restarted you can actually then repeat that step without seeing crashes. This is in a running configuration so other devices (door sensors, energy monitor and adapter-plus) were in the waiting state.

Achronite commented 3 years ago

I think I'll need more log data to solve this one. As you seem happy recompiling, please un-comment line 17 in energenie-ener314rt/C/energenie/trace.h so it reads:

define TRACE_POSIX

And then recompile the C code. There is also FULLTRACE mode too, please try to reproduce the SEGV and then paste the logs.

I forgot to ask, which versions are you using of my node-red and node modules?

shaw-iot commented 3 years ago

Below is the TRACE_POSIX log just before and after the crash. I'll try and get a FULLTRACE next.


,"APPARENT_POWER":2677,"VOLTAGE":3.812500,"CURRENT":9.542969}
Feb 18 21:15:32 ntp0 Node-RED[18709]: ***********************************************************************************************************************************************************************************************************@openThings_receive: Returning: {"deviceId":6409,"mfrId":4,"productId":5,"timestamp":1613682932,"APPARENT_POWER":2697,"VOLTAGE":3.812500,"CURRENT":9.613281}
Feb 18 21:15:42 ntp0 Node-RED[18709]: ***********************************************************************************************************************************************************************************************************@openThings_receive: Returning: {"deviceId":6409,"mfrId":4,"productId":5,"timestamp":1613682942,"APPARENT_POWER":2677,"VOLTAGE":3.812500,"CURRENT":9.542969}
Feb 18 21:15:52 ntp0 Node-RED[18709]: **************************************************************************************************************************************************************************************************openthings_deviceList(): called
Feb 18 21:15:52 ntp0 Node-RED[18709]: openthings_deviceList(): Returning: {"numDevices":6, "devices":[
Feb 18 21:15:52 ntp0 Node-RED[18709]: {"mfrId":4,"productId":5,"deviceId":6409,"control":0,"product":"House Monitor","joined":1},
Feb 18 21:15:52 ntp0 Node-RED[18709]: {"mfrId":4,"productId":13,"deviceId":7318,"control":0,"product":"Open Sensor","joined":1},
Feb 18 21:15:52 ntp0 Node-RED[18709]: {"mfrId":4,"productId":13,"deviceId":4293,"control":0,"product":"Open Sensor","joined":1},
Feb 18 21:15:52 ntp0 Node-RED[18709]: {"mfrId":4,"productId":13,"deviceId":7314,"control":0,"product":"Open Sensor","joined":1},
Feb 18 21:15:52 ntp0 Node-RED[18709]: {"mfrId":4,"productId":13,"deviceId":7395,"control":0,"product":"Open Sensor","joined":1},
Feb 18 21:15:52 ntp0 Node-RED[18709]: {"mfrId":4,"productId":13,"deviceId":5281,"control":0,"product":"Open Sensor","joined":1}]}
Feb 18 21:15:52 ntp0 systemd[1]: nodered.service: Main process exited, code=killed, status=11/SEGV
Feb 18 21:15:52 ntp0 systemd[1]: nodered.service: Failed with result 'signal'.
Feb 18 21:16:12 ntp0 systemd[1]: nodered.service: Service RestartSec=20s expired, scheduling restart.
Feb 18 21:16:12 ntp0 systemd[1]: nodered.service: Scheduled restart job, restart counter is at 1.
Feb 18 21:16:12 ntp0 systemd[1]: Stopped Node-RED graphical event wiring tool.
Feb 18 21:16:12 ntp0 systemd[1]: Started Node-RED graphical event wiring tool.
Feb 18 21:16:16 ntp0 Node-RED[19442]: 18 Feb 21:16:16 - [info]
Feb 18 21:16:16 ntp0 Node-RED[19442]: Welcome to Node-RED
Feb 18 21:16:16 ntp0 Node-RED[19442]: ===================
Feb 18 21:16:16 ntp0 Node-RED[19442]: 18 Feb 21:16:16 - [info] Node-RED version: v1.2.9
Feb 18 21:16:16 ntp0 Node-RED[19442]: 18 Feb 21:16:16 - [info] Node.js  version: v12.20.2
Feb 18 21:16:16 ntp0 Node-RED[19442]: 18 Feb 21:16:16 - [info] Linux 5.10.11-v7+ arm LE
Feb 18 21:16:18 ntp0 Node-RED[19442]: 18 Feb 21:16:18 - [info] Loading palette nodes
Feb 18 21:16:22 ntp0 Node-RED[19442]: napi_energenie.Init() called
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] Settings file  : /home/pi/.node-red/settings.js
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] Context store  : 'default' [module=memory]
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] User directory : /home/pi/.node-red
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [warn] Projects disabled : editorTheme.projects.enabled=false
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] Flows file     : /home/pi/.node-red/flows_raspberrypi.json
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] Server now running at http://127.0.0.1:1880/
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] Starting flows
Feb 18 21:16:23 ntp0 Node-RED[19442]: init_ener314(): initialising
Feb 18 21:16:23 ntp0 Node-RED[19442]: init_ener314(): mutex created & locked
Feb 18 21:16:23 ntp0 Node-RED[19442]: radio_init
Feb 18 21:16:23 ntp0 Node-RED[19442]: radio_ver=36
Feb 18 21:16:23 ntp0 Node-RED[19442]: radio_standby
Feb 18 21:16:23 ntp0 Node-RED[19442]: _wait_ready
Feb 18 21:16:23 ntp0 Node-RED[19442]: *radio_standby
Feb 18 21:16:23 ntp0 Node-RED[19442]: _wait_ready
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] ener314rt: radio initialised
Feb 18 21:16:23 ntp0 Node-RED[19442]: 18 Feb 21:16:23 - [info] [energenie-board:ENER314-RT] starting monitor thread, timeout=500
Feb 18 21:16:23 ntp0 Node-RED[19442]: tf_ napi_create_threadsafe_function done
Feb 18 21:16:23 ntp0 Node-RED[19442]: tf_ monitor thread started, timeout=500
Feb 18 21:16:23 ntp0 Node-RED[19442]: tx_openThings_receive_thread starting
Feb 18 21:16:23 ntp0 Node-RED[19442]: _wait_ready
shaw-iot commented 3 years ago

Unfortunately no more more detail with FULLTRACE

Feb 18 22:11:49 ntp0 Node-RED[20020]: openThings_receive(): rec:1 {"name":"VOLTAGE","id":118,"type":2,"str":"","int":976,"float":3.812500}
Feb 18 22:11:49 ntp0 Node-RED[20020]: openThings_receive(): rec:2 {"name":"CURRENT","id":105,"type":2,"str":"","int":2525,"float":9.863281}
Feb 18 22:11:49 ntp0 Node-RED[20020]: openThings_receive: Returning: {"deviceId":6409,"mfrId":4,"productId":5,"timestamp":1613686309,"APPARENT_POWER":2767,"VOLTAGE":3.812500,"CURRENT":9.863281}
Feb 18 22:11:57 ntp0 Node-RED[20020]: *******************************************************************************************************************************************************openthings_deviceList(): called
Feb 18 22:11:57 ntp0 Node-RED[20020]: openthings_deviceList(): Returning: {"numDevices":6, "devices":[
Feb 18 22:11:57 ntp0 Node-RED[20020]: {"mfrId":4,"productId":5,"deviceId":6409,"control":0,"product":"House Monitor","joined":1},
Feb 18 22:11:57 ntp0 Node-RED[20020]: {"mfrId":4,"productId":13,"deviceId":7395,"control":0,"product":"Open Sensor","joined":1},
Feb 18 22:11:57 ntp0 Node-RED[20020]: {"mfrId":4,"productId":13,"deviceId":7314,"control":0,"product":"Open Sensor","joined":1},
Feb 18 22:11:57 ntp0 Node-RED[20020]: {"mfrId":4,"productId":13,"deviceId":5281,"control":0,"product":"Open Sensor","joined":1},
Feb 18 22:11:57 ntp0 Node-RED[20020]: {"mfrId":4,"productId":13,"deviceId":7318,"control":0,"product":"Open Sensor","joined":1},
Feb 18 22:11:57 ntp0 Node-RED[20020]: {"mfrId":4,"productId":13,"deviceId":4293,"control":0,"product":"Open Sensor","joined":1}]}
Feb 18 22:11:57 ntp0 systemd[1]: nodered.service: Main process exited, code=killed, status=11/SEGV
Feb 18 22:11:57 ntp0 systemd[1]: nodered.service: Failed with result 'signal'.
Feb 18 22:12:17 ntp0 systemd[1]: nodered.service: Service RestartSec=20s expired, scheduling restart.
Feb 18 22:12:17 ntp0 systemd[1]: nodered.service: Scheduled restart job, restart counter is at 1.
Feb 18 22:12:17 ntp0 systemd[1]: Stopped Node-RED graphical event wiring tool.
Feb 18 22:12:17 ntp0 systemd[1]: Started Node-RED graphical event wiring tool.
Feb 18 22:12:22 ntp0 Node-RED[21063]: 18 Feb 22:12:22 - [info]
Feb 18 22:12:22 ntp0 Node-RED[21063]: Welcome to Node-RED
Feb 18 22:12:22 ntp0 Node-RED[21063]: ===================
Feb 18 22:12:22 ntp0 Node-RED[21063]: 18 Feb 22:12:22 - [info] Node-RED version: v1.2.9
Feb 18 22:12:22 ntp0 Node-RED[21063]: 18 Feb 22:12:22 - [info] Node.js  version: v12.20.2
Feb 18 22:12:22 ntp0 Node-RED[21063]: 18 Feb 22:12:22 - [info] Linux 5.10.11-v7+ arm LE
Feb 18 22:12:24 ntp0 Node-RED[21063]: 18 Feb 22:12:24 - [info] Loading palette nodes
Feb 18 22:12:28 ntp0 Node-RED[21063]: napi_energenie.Init() called
Achronite commented 3 years ago

OK - I think I may have a suspect, the message returned from deviceList is longer than the buffer (500 chars), which causes a memory fault when trying to add an additional device (I do not current check the length, when building the return message in openThings_deviceList).

I'll work on a patch, what version are you using? v0.3.x or v0.4.x?

shaw-iot commented 3 years ago

That would certainly explain why it takes a while before it starts to crash. Straight after loading the number of devices is small and it takes a while for them to build up. I'm using 0.3.9 from the built-in "Manage palette", but I can either manually apply the patch or figure out how to run with v4.

Achronite commented 3 years ago

I've applied the fix to the 'develop' branch; but the signature of the function has changed, so I'm not planning on back-porting it to v0.3.x.

By the way, v0.4.x has new nodes for specific devices - I've done ones for the PIR and Themostat so far; I'm planning on one for the door sensor too now as I have your messages. Does it send '0' for Open and '1' for closed in DOOR_SENSOR?

shaw-iot commented 3 years ago

Yes, that's what it appears to send (1=OPEN, 0=CLOSED). Below is a trace of opening and closing the door. I've left the home energy data in there in case it's useful and so you can see the timestamps of events as they are received. (btw, this is with the 50ms sleep I left in the code from testing yesterday)

openThings_receive(): rec:1 {"name":"VOLTAGE","id":118,"type":2,"str":"","int":976,"float":3.812500}
openThings_receive(): rec:2 {"name":"CURRENT","id":105,"type":2,"str":"","int":407,"float":1.589844}
openThings_receive: Returning: {"deviceId":6409,"mfrId":4,"productId":5,"timestamp":1613753223,"APPARENT_POWER":446,"VOLTAGE":3.812500,"CURRENT":1.589844}
********************************************************************************************************************************************@
openThings_receive(): rec:0 {"name":"DOOR_SENSOR","id":100,"type":1,"str":"","int":1,"float":1.000000}
openThings_receive: Returning: {"deviceId":7314,"mfrId":4,"productId":13,"timestamp":1613753230,"DOOR_SENSOR":1}
*@
openThings_receive(): rec:0 {"name":"DOOR_SENSOR","id":100,"type":1,"str":"","int":1,"float":1.000000}
openThings_receive: Returning: {"deviceId":7314,"mfrId":4,"productId":13,"timestamp":1613753230,"DOOR_SENSOR":1}
*****************************************************@
openThings_receive(): rec:0 {"name":"DOOR_SENSOR","id":100,"type":1,"str":"","int":0,"float":0.000000}
openThings_receive: Returning: {"deviceId":7314,"mfrId":4,"productId":13,"timestamp":1613753233,"DOOR_SENSOR":0}
*@
openThings_receive(): rec:0 {"name":"DOOR_SENSOR","id":100,"type":1,"str":"","int":0,"float":0.000000}
openThings_receive: Returning: {"deviceId":7314,"mfrId":4,"productId":13,"timestamp":1613753233,"DOOR_SENSOR":0}
*****@
openThings_receive(): rec:0 {"name":"APPARENT_POWER","id":80,"type":1,"str":"","int":477,"float":477.000000}
openThings_receive(): rec:1 {"name":"VOLTAGE","id":118,"type":2,"str":"","int":976,"float":3.812500}
openThings_receive(): rec:2 {"name":"CURRENT","id":105,"type":2,"str":"","int":435,"float":1.699219}
openThings_receive: Returning: {"deviceId":6409,"mfrId":4,"productId":5,"timestamp":1613753233,"APPARENT_POWER":477,"VOLTAGE":3.812500,"CURRENT":1.699219}
************************************************************************************************************************************************************************************************************@
openThings_receive(): rec:0 {"name":"APPARENT_POWER","id":80,"type":1,"str":"","int":463,"float":463.000000}
openThings_receive(): rec:1 {"name":"VOLTAGE","id":118,"type":2,"str":"","int":976,"float":3.812500}
openThings_receive(): rec:2 {"name":"CURRENT","id":105,"type":2,"str":"","int":423,"float":1.652344}
openThings_receive: Returning: {"deviceId":6409,"mfrId":4,"productId":5,"timestamp":1613753244,"APPARENT_POWER":463,"VOLTAGE":3.812500,"CURRENT":1.652344}
Achronite commented 3 years ago

Thank you, I've added the new open sensor node to the develop branch of node-red-contrib-energenie-ener314rt.

I'll aim to release the packages for this (and the dependent node module) in the next couple of days.

shaw-iot commented 3 years ago

This is now working OK for me (7 devices), thanks.