Closed MortuuS closed 1 year ago
I have the exact same issue, in my case the reboot happens each approx 2M seconds. Disabling the device in homeassistant solves the issue as well.
Details about my device: Eve Single Pro-line, 3 phase, display, charging cable (NG910-60027) by Alfen Firmware: 5.8.0-4120
Same experience, seems to be a device issue unfortunately. Once over 2Ms the device reboots within 10 minutes of restarting the integration
In Alfen.py if you increase
"MIN_TIME_BETWEEN_UPDATES = timedelta(seconds=120)"
It will extend the time between reboots but the longer update may not be suitable for some.
Also In the same file change the formula to "self.uptime = max(0, prop['value'] / 1000000 / 3.565) " To give the approx time in hours since last reboot
I have it rebooting only once over more than 24hrs now
Since the service "Alfen Wallbox: reboot_wallbox" is available, an automated maintenance reboot might be a possible workaround as well, right?
It does work.ive just tested. There will be a sweet spot between the refresh time and a reboot every 24hrs, though I guess you could reboot as often as you'd like.
I've not seen anyone on the v6 FW to see if alfen have actually resolved this issue either
It does work.ive just tested. There will be a sweet spot between the refresh time and a reboot every 24hrs, though I guess you could reboot as often as you'd like.
I've not seen anyone on the v6 FW to see if alfen have actually resolved this issue either
My Alfen is on >v6 FW and still crashes every 8 hours.
I don't see how a reboot due to a crash or a clean reboot on command should make a difference when solar charging with help from EVCC. Especially when switched to 1 phase charging due to not enough available surplus PV.
Strange that you have a reboot every 8 hours. How long does a charge with evcc take? If a normal charge is less then 8 hours, a forced reboot can help you if you initiate this reboot before you start charging, or not?
I think the memory leak is linked to the logs created when using HA integrations to read the info. So extending the time as I suggested means fewer logs are generated so it takes longer to crash.
I'm not savy enough to know why the connection needs to be dropped and re-established so often or how this integration works under the bonnet.
Strange that you have a reboot every 8 hours. How long does a charge with evcc take? If a normal charge is less then 8 hours, a forced reboot can help you if you initiate this reboot before you start charging, or not?
As noted in the OP my wallbox seems to reboot 4 times a day. So one reboot every 8 hours.
The charge time with evcc depends on several things: 1. The amount of PV power available. When trying to only use PV surplus power and with the current sun hours in the Netherlands it can easily take more than 8 hours. 😑
I'm going to try the suggestions of gizmoy2k as a workaround in the mean time. But I'm still hopefull a permanent solution is also possible.
Well with
MIN_TIME_BETWEEN_UPDATES = timedelta(seconds=300)
the reboots have stopped.
Currently at 83 hours uptime. I guess 300s is long enough not to overwork the log or there is a purge of some kind.
I still don't have my EV so not sure how practical 300s update time is when it comes to HA or any automations you may want to run
MIN_TIME_BETWEEN_UPDATES = timedelta(seconds=300)
Thanks for sharing this! I have changed this as well now, I'll report back with the results.
Thanks for sharing! I also changed this setting and had another reboot in 2 days :(
Hey guys, interesting use of the REST API. It was never meant for automation, but if it works... :-) I'm willing to take a look at the issue. Based on the comments above it seems to be related to the time between requesting a bunch of properties from the REST API with GET requests. Sounds like a memory leak somewhere in the charger firware. If anyone has a week long log file with one or more reboots it would help in finding the issue. Log file can be downloaded using the ACE Service Installer. Probably using MyEve as well, but I'm not 100% sure.
I'm on holiday just now but when I return I can turn the time back down and send you the logs no problem.
107 hours is my best without a reboot so far
Hey guys, interesting use of the REST API. It was never meant for automation, but if it works... :-) I'm willing to take a look at the issue. Based on the comments above it seems to be related to the time between requesting a bunch of properties from the REST API with GET requests. Sounds like a memory leak somewhere in the charger firware. If anyone has a week long log file with one or more reboots it would help in finding the issue. Log file can be downloaded using the ACE Service Installer. Probably using MyEve as well, but I'm not 100% sure.
Here is a log file with 4 reboots Line 3579: 2023-06-17T17:26:19.558Z:INFO:application.c:305:System boot #132, cause Watchdog reset Line 18158: 2023-06-18T14:06:58.386Z:INFO:application.c:305:System boot #133, cause Watchdog reset Line 43619: 2023-06-19T11:27:05.613Z:INFO:application.c:305:System boot #134, cause Watchdog reset Line 47195: 2023-06-19T18:55:07.253Z:INFO:application.c:305:System boot #135, cause Watchdog reset
If there is a dump it is as follows 2023-06-17T17:26:19.558Z:INFO:application.c:305:System boot #132, cause Watchdog reset 2023-06-17T17:26:19.566Z:ERROR:bspCrashDump.c:173: -- Crash details recovered. 2023-06-17T17:26:19.578Z:ERROR:bspCrashDump.c:192:Out of memory
When the integration is running repetative entries of the following occur, I think this is linked to the crash
My timedelta is set to 60 seconds just now which ties up with the login:
2023-06-18T14:40:12.363Z:SECURITY:httpd_login.c:154:WebClient - connected (account: admin) 2023-06-18T14:40:27.527Z:WARNING:taskWebClient.c:223:Client (fd=2) disconnected (r: -0x7880) 2023-06-18T14:40:27.535Z:INFO:taskWebClient.c:155:Removing previously connected client 2023-06-18T14:41:12.417Z:SECURITY:httpd_login.c:154:WebClient - connected (account: admin) 2023-06-18T14:41:43.519Z:WARNING:taskWebClient.c:223:Client (fd=2) disconnected (r: -0x7880) 2023-06-18T14:41:43.527Z:INFO:taskWebClient.c:155:Removing previously connected client
The mobile connection is poor in my area so the log is littered with the modem resetting trying to get a connection I cant work out how to get it connected to back office via a hardwired connection, if thats even possible
Hey guys, interesting use of the REST API. It was never meant for automation, but if it works... :-) I'm willing to take a look at the issue. Based on the comments above it seems to be related to the time between requesting a bunch of properties from the REST API with GET requests. Sounds like a memory leak somewhere in the charger firware. If anyone has a week long log file with one or more reboots it would help in finding the issue. Log file can be downloaded using the ACE Service Installer. Probably using MyEve as well, but I'm not 100% sure.
Here is a log file with 4 reboots Line 3579: 2023-06-17T17:26:19.558Z:INFO:application.c:305:System boot #132, cause Watchdog reset Line 18158: 2023-06-18T14:06:58.386Z:INFO:application.c:305:System boot #133, cause Watchdog reset Line 43619: 2023-06-19T11:27:05.613Z:INFO:application.c:305:System boot #134, cause Watchdog reset Line 47195: 2023-06-19T18:55:07.253Z:INFO:application.c:305:System boot #135, cause Watchdog reset
If there is a dump it is as follows 2023-06-17T17:26:19.558Z:INFO:application.c:305:System boot #132, cause Watchdog reset 2023-06-17T17:26:19.566Z:ERROR:bspCrashDump.c:173: -- Crash details recovered. 2023-06-17T17:26:19.578Z:ERROR:bspCrashDump.c:192:Out of memory
When the integration is running repetative entries of the following occur, I think this is linked to the crash
My timedelta is set to 60 seconds just now which ties up with the login:
2023-06-18T14:40:12.363Z:SECURITY:httpd_login.c:154:WebClient - connected (account: admin) 2023-06-18T14:40:27.527Z:WARNING:taskWebClient.c:223:Client (fd=2) disconnected (r: -0x7880) 2023-06-18T14:40:27.535Z:INFO:taskWebClient.c:155:Removing previously connected client 2023-06-18T14:41:12.417Z:SECURITY:httpd_login.c:154:WebClient - connected (account: admin) 2023-06-18T14:41:43.519Z:WARNING:taskWebClient.c:223:Client (fd=2) disconnected (r: -0x7880) 2023-06-18T14:41:43.527Z:INFO:taskWebClient.c:155:Removing previously connected client
The mobile connection is poor in my area so the log is littered with the modem resetting trying to get a connection I cant work out how to get it connected to back office via a hardwired connection, if thats even possible
Hi, thanks for the information. Clearly some kind of memory leak. What I would like to see is if the out-of-memory issue occurs much less frequent if you change the REST API connection interval to a higher value such as 300 seconds, like @gizmoy2k did. If you can confirm that this stops/delays the out of memory issue, then we can try to internally reproduce this and look for a fix.
As for connection your station wired, as long as the back office is in your control or if the back office is accessible via Wired, then you can configure via the ACE Service Installer to have the station connect via Wired using the Connectivity tab.
Hi, thanks for the information. Clearly some kind of memory leak. What I would like to see is if the out-of-memory issue occurs much less frequent if you change the REST API connection interval to a higher value such as 300 seconds, like @gizmoy2k did. If you can confirm that this stops/delays the out of memory issue, then we can try to internally reproduce this and look for a fix.
Hi, I can confirm. If the connection interval increases to 300sec, the reboot occurs each 40M seconds approx. Currently I have it set to 30 seconds ( using this fork: https://github.com/leeyuentuen/alfen_wallbox ), and with this I have a reboot each 1.5-1.8M seconds. System Boot Reason: "Watchdog reset"
Hi, thanks for the information. Clearly some kind of memory leak. What I would like to see is if the out-of-memory issue occurs much less frequent if you change the REST API connection interval to a higher value such as 300 seconds, like @gizmoy2k did. If you can confirm that this stops/delays the out of memory issue, then we can try to internally reproduce this and look for a fix.
Hi, I can confirm. If the connection interval increases to 300sec, the reboot occurs each 40M seconds approx. Currently I have it set to 30 seconds ( using this fork: https://github.com/leeyuentuen/alfen_wallbox ), and with this I have a reboot each 1.5-1.8M seconds. System Boot Reason: "Watchdog reset"
I've had a local setup with 6.3.0 spamming logins/connections every 10 seconds for a while, but the memory statistics show no issues whatsoever (free memory remains stable). What actions are being done every time you connect to the wallbox? Are any settings being GET'd/POST'd? Then I can try to add that to my local setup.
Hi, thanks for the information. Clearly some kind of memory leak. What I would like to see is if the out-of-memory issue occurs much less frequent if you change the REST API connection interval to a higher value such as 300 seconds, like @gizmoy2k did. If you can confirm that this stops/delays the out of memory issue, then we can try to internally reproduce this and look for a fix.
Hi, I can confirm. If the connection interval increases to 300sec, the reboot occurs each 40M seconds approx. Currently I have it set to 30 seconds ( using this fork: https://github.com/leeyuentuen/alfen_wallbox ), and with this I have a reboot each 1.5-1.8M seconds. System Boot Reason: "Watchdog reset"
increase the interval, will the value then not updated every 300 sec? so the data is about 300 sec old?
Hi, thanks for the information. Clearly some kind of memory leak. What I would like to see is if the out-of-memory issue occurs much less frequent if you change the REST API connection interval to a higher value such as 300 seconds, like @gizmoy2k did. If you can confirm that this stops/delays the out of memory issue, then we can try to internally reproduce this and look for a fix.
Hi, I can confirm. If the connection interval increases to 300sec, the reboot occurs each 40M seconds approx. Currently I have it set to 30 seconds ( using this fork: https://github.com/leeyuentuen/alfen_wallbox ), and with this I have a reboot each 1.5-1.8M seconds. System Boot Reason: "Watchdog reset"
I've had a local setup with 6.3.0 spamming logins/connections every 10 seconds for a while, but the memory statistics show no issues whatsoever (free memory remains stable). What actions are being done every time you connect to the wallbox? Are any settings being GET'd/POST'd? Then I can try to add that to my local setup.
On my branch, you have every x time the following URLs:
I think you can get an overflow if you try to connect and another device is also trying to connect them. I think there is a limited of simultaneous API call to that device.
increase the interval, will the value then not updated every 300 sec? so the data is about 300 sec old?
Yes, but it's just for a test to see if the memory leak is directly related to REST API, which it seems to be, considering all the responses.
On my branch, you have every x time the following URLs:
* authentication (api/login) * get data (32 id's): /api/prop?ids=2060_0,..... * get the rest of the data: /api/prop?ids=2060_0,..... * logout (api/logout)
I think you can get an overflow if you try to connect and another device is also trying to connect them. I think there is a limited of simultaneous API call to that device.
Device allows only 1 connection at a time. I verified just now that connecting a 2nd client will result in a connect/read time-out, so no overflow there.
Thanks for the API calls, I took the specifics from https://github.com/leeyuentuen/alfen_wallbox/blob/master/custom_components/alfen_wallbox/alfen.py#L74C18-L74C18 and added them into my local script. However I'm not sure what some of them are supposed to be doing, because 2070_2
does not map to any property (2070 is an UNSIGNED8, and as such it has no sub-properties). Regardless of that, I've been running it and (sadly?) still don't see any issues with memory.
When changing the update interval in your tooling, does it only change the REST API call frequency or also something related to Modbus TCP/IP or literally anything else?
To be clear, I'm using the following minimal script to try to reproduce right now:
import requests
from urllib3.exceptions import InsecureRequestWarning
import time
station_ip = '192.168.0.251'
prop_string1 = '2060_0,2056_0,2221_3,2221_4,2221_5,2221_A,2221_B,2221_C,2221_16,2201_0,2501_2,2221_22,2129_0,2126_0,2068_0,2069_0,2062_0,2064_0,212B_0,212D_0,2185_0,2053_0,2067_0,212F_1,212F_2,212F_3,2100_0,2101_0,2102_0,2104_0,2105_0'
prop_string2 = '2057_0,2112_0,2071_1,2071_2,2072_1,2073_1,2074_1,2075_1,2076_0,2078_1,2078_2,2079_1,207A_1,207B_1,207C_1,207D_1,207E_1,207F_1,2080_1,2081_0,2082_0,2110_0,3280_1,3280_2,3280_3,3280_4'
timeout_s = 10
password = '<removed>'
def main():
# Don't worry about any TLS errors for this script
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
try:
while True:
# Create persistent session so we are authorized upon each request following login
with requests.Session() as persistent_session:
try:
# Login
print('Sending login request...')
response = persistent_session.post(url='https://{}/api/login'.format(station_ip), json={'username':'admin', 'password':'{}'.format(password)}, verify=False, timeout=timeout_s)
print('Received response {}'.format(response))
# GET some properties
print('Getting properties..')
response = persistent_session.get(url='https://{}/api/prop?ids={}'.format(station_ip, prop_string1), verify=False, timeout=timeout_s)
print('Received response {}'.format(response))
# GET more properties
print('Getting more properties..')
response = persistent_session.get(url='https://{}/api/prop?ids={}'.format(station_ip, prop_string2), verify=False, timeout=timeout_s)
print('Received response {}'.format(response))
# Logout
print('Sending logout request...')
response = persistent_session.post(url='https://{}/api/logout'.format(station_ip), json={}, verify=False, timeout=timeout_s)
print('Received response {}'.format(response))
except requests.exceptions.ReadTimeout:
print('Failed to connect/read..')
pass # Just try again later
# Sleep until next attempt
time.sleep(10.0)
except KeyboardInterrupt:
pass
if __name__ == '__main__':
main()
2070_2 is suppose the date time from the device, but I didn't mapped them yet because I need to parse that date serial to an readable format. but this should be not the issue, you can fetch more id's and you don't need to process them.
I think it should be better to look at the log of the device
replace the 1 with the page that you want to get. i think maybe there we can find the reason why the watchdog is rebooting
in the meantime, I've update the code and remove that 2070_2 id
2070_2 is suppose the date time from the device, but I didn't mapped them yet because I need to parse that date serial to an readable format. but this should be not the issue, you can fetch more id's and you don't need to process them.
Nope, that should be 2059 (sysDateTime).
i think maybe there we can find the reason why the watchdog is rebooting
I already checked the provided logging, and it shows an out of memory error. That's why I'm trying to investigate that, since it seems to be the cause of the watchdog.
this morning, i've remove the mdns that redmer give. Just now running, don't know the result yet
this morning, i've remove the mdns that redmer give. Just now running, don't know the result yet
I did a couple of scans with Bonjour Browser on Windows, but no memory issues from that either. If it turns out that the zeroconf module of the 'alfen wallbox' repo is triggering the issue, please let me know so I can try to reproduce.
Ok, not the right place to discuss maybe, but since the thread discussing the reboots is already here. The results below are with the release of https://github.com/leeyuentuen/alfen_wallbox
this morning, i've remove the mdns that redmer give. Just now running, don't know the result yet
Is this maybe related, look at screenshot:
I'll keep an eye on it
i'm still testing, but there is a 'dev' beta version where I'm rewriting them. There are a lot of changes, still testing, could be some issue on it. For stable version, keep using master branch
Also adding controls:
Great Work I just installed via HACS after previously manually installing and had to manually edit const.py to re-add my model, NG910-60577, which is not included by default.
Could this be written so that it asks the user what the model is or more models added to the list ?
All models listed in section 5, page 24 here https://www.free-instruction-manuals.com/pdf/pa_2988006.pdf
Thanks Again
Great Work I just installed via HACS after previously manually installing and had to manually edit const.py to re-add my model, NG910-60577, which is not included by default.
Could this be written so that it asks the user what the model is or more models added to the list ?
All models listed in section 5, page 24 here https://www.free-instruction-manuals.com/pdf/pa_2988006.pdf
Thanks Again
I've added a list of missing models
Hi guys, unfortunately I haven't checked GitHub for a while but I'm glad that you found the issue. Thanks for the fix!
Hello,
I have been using this HA integration for some time now. But recently noticed that while using it, the wallbox will reboot 4 times a day. The log of the wallbox seems to suggest this is due to a memory issue. The reboots stop when I disable this integration. I've been searching online for possible solutions, but I have only found several other people who report the same issue when using the integration.
I started noticed the reboots since I also started using the EVCC charge manager plugin for the wallbox. And annoyingly the reboots seem to mess up the charge logic of EVCC.
Any idea what could be causing this?
Thanks!
2023-01-29T23:44:20.960Z:RESET:application.c:353:========================================= 2023-01-29T23:44:20.972Z:INFO:application.c:361:build: Aug 15 2022 12:20:52 2023-01-29T23:44:20.980Z:INFO:application.c:362:version: 5.8.1-4123 2023-01-29T23:44:20.988Z:INFO:application.c:363:compiler: gcc 4.8.3 2023-01-29T23:44:20.996Z:INFO:application.c:216:fwu state inactive 2023-01-29T23:44:21.003Z:INFO:application.c:236:Bootloader version 0.1.6 2023-01-29T23:44:21.234Z:USER:taskCANopen.c:224:Persistent configuration retrieved, 243 items 2023-01-29T23:44:21.246Z:INFO:application.c:269:FLASH read protection 0xcc level 2 (permanent) 2023-01-29T23:44:21.269Z:WARNING:bspI2c.c:670:I2C slave not responding #-3 AUXBOARD@0x0028, attempting bus recovery 2023-01-29T23:44:21.285Z:WARNING:bspI2c.c:689:Device AUXBOARD@0x0028 not available 2023-01-29T23:44:21.292Z:USER:application.c:412:Using P1 port 2023-01-29T23:44:21.300Z:INFO:application.c:305:System boot #69, cause Watchdog reset 2023-01-29T23:44:21.308Z:ERROR:bspCrashDump.c:173: -- Crash details recovered. 2023-01-29T23:44:21.316Z:ERROR:bspCrashDump.c:192:Out of memory 2023-01-29T23:44:21.324Z:ERROR:bspCrashDump.c:192:free 3968 bytes, lowest ever free 3352 bytes 2023-01-29T23:44:21.335Z:ERROR:bspCrashDump.c:192:fragmentation: 108240048.0% 2023-01-29T23:44:21.343Z:ERROR:bspCrashDump.c:192:fragmentation: 4K blocks -3.2% 2023-01-29T23:44:21.351Z:ERROR:bspCrashDump.c:192:# free blocks: 64 2023-01-29T23:44:21.359Z:ERROR:bspCrashDump.c:192:largest free: 6312 bytes 2023-01-29T23:44:21.367Z:ERROR:bspCrashDump.c:192: 2023-01-29T23:44:17.757Z SW Watchdog (fffe7e4e) triggered. 2023-01-29T23:44:21.378Z:ERROR:bspCrashDump.c:192:Missing: Modbus NFC 2023-01-29T23:44:21.382Z:ERROR:bspCrashDump.c:192:determining cpu hog... 2023-01-29T23:44:21.390Z:ERROR:bspCrashDump.c:192:Tmr Svc 1 <1% 2023-01-29T23:44:21.402Z:ERROR:bspCrashDump.c:192:display 25155 97% 2023-01-29T23:44:21.410Z:ERROR:bspCrashDump.c:192:lwip 0 <1% 2023-01-29T23:44:21.417Z:ERROR:bspCrashDump.c:192:WebClients 0 <1% 2023-01-29T23:44:21.425Z:ERROR:bspCrashDump.c:192:NFC 0 <1% 2023-01-29T23:44:21.433Z:ERROR:bspCrashDump.c:192:Modbus 0 <1% 2023-01-29T23:44:21.445Z:ERROR:bspCrashDump.c:192:PPP 0 <1% 2023-01-29T23:44:21.453Z:ERROR:bspCrashDump.c:192:OCPP 0 <1% 2023-01-29T23:44:21.460Z:ERROR:bspCrashDump.c:192:WebServer 0 <1% 2023-01-29T23:44:21.468Z:ERROR:bspCrashDump.c:192:CommandLine 0 <1% 2023-01-29T23:44:21.476Z:ERROR:bspCrashDump.c:192:IDLE 0 <1% 2023-01-29T23:44:21.484Z:ERROR:bspCrashDump.c:192:Eth_if 5 <1% 2023-01-29T23:44:21.496Z:ERROR:bspCrashDump.c:192:Modem 2 <1% 2023-01-29T23:44:21.503Z:ERROR:bspCrashDump.c:192:ModbusTCPIPSlav 0 <1% 2023-01-29T23:44:21.511Z:ERROR:bspCrashDump.c:192:P1Meter 2 <1% 2023-01-29T23:44:21.519Z:ERROR:bspCrashDump.c:192:CANopen 427 1% 2023-01-29T23:44:21.527Z:ERROR:bspCrashDump.c:192:ModbusTCPIPMast 0 <1% 2023-01-29T23:44:21.539Z:ERROR:bspCrashDump.c:192:Main 0 <1% 2023-01-29T23:44:21.546Z:ERROR:bspCrashDump.c:192:TCP/IP 0 <1% 2023-01-29T23:44:21.554Z:ERROR:bspCrashDump.c:192:Master 0 <1% 2023-01-29T23:44:21.562Z:ERROR:bspCrashDump.c:197: -- End of crash details 2023-01-29T23:44:21.570Z:INFO:board_ng9xx.c:1475:Registered timeout getHardware 2023-01-29T23:44:21.593Z:WARNING:bspI2c.c:670:I2C slave not responding #-3 MMA8451@0x000d, attempting bus recovery 2023-01-29T23:44:21.609Z:WARNING:bspI2c.c:689:Device MMA8451@0x000d not available 2023-01-29T23:44:21.628Z:WARNING:bspI2c.c:670:I2C slave not responding #-3 BMA456Q@0x0000, attempting bus recovery 2023-01-29T23:44:21.644Z:WARNING:bspI2c.c:689:Device BMA456Q@0x0000 not available