Egyras / HeishaMon

Panasonic Aquarea air-water H, J, K and L series protocol decrypt
241 stars 121 forks source link

No restart of ruleset after Heishamon problem. #531

Closed McMagellan closed 1 month ago

McMagellan commented 2 months ago

Last night I was able to test my new ruleset for clock extension due to the low temperatures. Unfortunately there was a problem and the rules no longer worked in the morning. I recorded the data on the debug interface and the process can be seen. I use the version V3.8 138 Alpha "550efd7" Why is there no further attempt to start the Rules Engine? Wifi Crashes Rules.txt

In the debug file you can see: After uptime 22 hours 38 minutes, the WiFi connection to the local network breaks down for an unknown reason and the AP becomes active. Line 22: In stat -> ## Wifi: -1% Line 50: WiFi lost, starting setup hotspot... Line 51: Reconnecting to WiFi failed. Waiting a few seconds before trying again. Line 493: WiFi (re)connected, shutting down hotspot...

Heishamon is restarted apparently after a crash. Line 495: ets Jan 8 2013,rst cause:4, boot mode:(3,0)

The ruleset is not reloaded. Line 562: Not loading rules due to crash reboot!

About 3 hours later I performed a manual reboot via the browser. Line 1088: Starting debugging, version: Alpha-550efd7 Line 1141: Enabling rules..

The rules are loaded without any problems and resume operation. Line 1455: ==== System#Boot ====

I see 2 problems here: 1 Why does a WIFI loss cause a Heishamon reset? 2 Why is Rules not restarted if it is possible with a manual reboot? There is no reason for this.

You can see what effect this has if the rule no longer works in the graphic between 7 a.m. and 10 a.m in that 40 h grafik. Screenshot 2024-09-13 at 19-52-40 Basisgrafik Max an Raspi 4 1 V2 - Dashboards - Grafana

Then I also saw that a set command was lost. See Quiet Mode Fail.txt file. Line 9: set Quiet mode to 2 Line 50: Previous read data attempt failed due to timeout! Quiet Mode Fail.txt

I now react to this in Rules by repeating the command after 15 seconds if the TOP has not taken on the value it should. That pushes the number of Rules.

McMagellan commented 2 months ago

Here is the next debug file with the complete last 24 hours. There were 2 crashes after which the rules no longer started. In between there was a manual reboot after which the rules were loaded again. After 5 minutes the second crash occurred. 1409aputty.log

Line 276479: first crash, Line 291757: manual reboot. Line 294445: second crash.

This time I don't see any clues as a trigger.

I took two measures: 1 Removed all print commands and $ variables that I included for documentation to make the code as smaller as possible. 2 Setting up a watchdog from Rules in conjunction with iobroker Blockly who reboots Heishamon and informs me about it via Whattsapp. Every 45 minutes (32x per day) Rules changes an unused parameter. If there is no change within 50 minutes, a reboot will be sent.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

McMagellan commented 1 month ago

Unfortunately no one has responded to this problem. It's currently the only weak point I've seen with Heishamon in the last 4 weeks. I can help myself by looking at the log file from the debug port during development the rulesset. Keeping the console window open for a longer period of time still causes Heishamon to crash. My Rulesset (10.000 Bytes long with up to 40 Rules) is running fine with using SetCurves. But, don't use the Console window !! @CurlyMoo @IgorYbema

CurlyMoo commented 1 month ago

This issue is dependent on the webserver issue that @IgorYbema and i where debugging. I'm awaiting some test results from him as asked here: https://github.com/CurlyMoo/webserver/issues/3#issuecomment-2345525637

geduxas commented 1 month ago

I think we have much more related issues, also something is crashing while on console screen and trying update firmware, others getting artefacts in MQTT, someone got some rule's crashing while watching console screen.. it could lead to one same problem

McMagellan commented 1 month ago

Thanks for the reply. I read the point you linked and can say the following. Months ago I had big problems parsing a ruleset which failed 40% of the time with error messages like "too large read". You fixed that completely and it works very well now.

I've never had any errors uploading firmware and I've probably carried out such a process more than 50 times in the last 3 months. However, I don't have a console window running at this point.

It's different when parsing a rule. It is important to see which rule the ruleset has errors in when developing a rule. A console window running in parallel only works with small rulesets, otherwise it crashes. But I can still access the information via the debug port and can continue working. Not many users have this option.

For me, the main problem is the console window that opens normally during normal operation with the ruleset running. After an indefinite period of time (can be several hours) Heishamon crashes. To me it looks like a storage space conflict because either the web server or Rules probably doesn't completely free up used storage space. If you then reboot, Heishamon crashes again after about 10 seconds. Normal operation will then resume.

The error is not reproducible but in any case it happens sooner or later. I only use the console rarely and for very short periods of time.

geduxas commented 1 month ago

It could lead to memory corruption or stack overflow.. i think it's from external library.. maybe from web stack or network..

To reproduce update crash, just open console window on one browser and from another try to update firmware.. mqtt artefacts couldn't be reproduced yet..