Closed fu-zhou closed 4 years ago
As a quick and dirty fix I'll give it a try to start the service every 24 hours with:
...
[Service]
ExecStart=/home/user/essmqtt/bin/essmqtt --mqtt_server 192.168.0....
# Restart=on-failure
# RestartSec=10
Restart=always
RestartSec=2
RuntimeMaxSec=86400
...
I'm trying to reproduce this - with 38 hours until it crashes it sounds like it will be hard to debug. Could you check whether on your system the memory footprint of essmqtt grows -- e.g. compare memory use right after starting the service to memory use after a few hours?
Alright, I took the automatic restart out again and restarted the service taking a screenshot of htop
right after the start. Let's see how it develops...
24 hours are over now and I have just checked the status of the service:
● essmqtt.service - ESS MQTT Communication
Loaded: loaded (/etc/systemd/system/essmqtt.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-05-24 21:50:39 CEST; 24h ago
Main PID: 55465 (essmqtt)
Tasks: 1 (limit: 2290)
CGroup: /system.slice/essmqtt.service
└─55465 /home/user/essmqtt/bin/python3 /home/user/essmqtt/bin/essmqtt --mqtt_server 192.168.0....
May 24 21:50:39 iobroker-vm systemd[1]: Started ESS MQTT Communication.
May 24 21:50:43 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:fetching auth key
May 24 21:50:45 iobroker-vm essmqtt[55465]: INFO:pyess.essmqtt:starting send loop
May 25 13:29:54 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:seems we got logged out, retrying after 1 seconds
May 25 13:29:55 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:fetching auth key
May 25 15:51:49 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 15:51:59 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 15:51:59 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 15:52:00 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
All of a sudden the system doesn't like
[Install]
Wants=network-online.target
any more, but the communication keeps on working and the only change in htop
so far is that "VIRT" went up from 308M to 309M - neglectable in my opinion. Let's see what happens in 14 hours ...
Service started: Sun 2020-05-24 21:50:39 Communication stopped: Tue 2020-05-26 12:04:33 According to the status, the service ist still running
● essmqtt.service - ESS MQTT Communication
Loaded: loaded (/etc/systemd/system/essmqtt.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-05-24 21:50:39 CEST; 1 day 15h ago
Main PID: 55465 (essmqtt)
Tasks: 1 (limit: 2290)
CGroup: /system.slice/essmqtt.service
└─55465 /home/user/essmqtt/bin/python3 /home/user/essmqtt/bin/essmqtt --mqtt_server 192.168.0....
May 25 13:29:54 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:seems we got logged out, retrying after 1 seconds
May 25 13:29:55 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:fetching auth key
May 25 15:51:49 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 15:51:59 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 15:51:59 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 15:52:00 iobroker-vm systemd[1]: /etc/systemd/system/essmqtt.service:15: Unknown lvalue 'Wants' in section 'Install
May 25 22:10:46 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:seems we got logged out, retrying after 1 seconds
May 25 22:10:47 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:fetching auth key
May 26 10:56:04 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:seems we got logged out, retrying after 1 seconds
May 26 10:56:05 iobroker-vm essmqtt[55465]: INFO:pyess.aio_ess:fetching auth key
htop right after the start of the essmqtt service:
htop after I figured ou that the communication stopped about an hour after the last sent value:
Thank you for testing that @fu-zhou . These screenshots look like there can't be any significant memory leak involved. I'm trying to reproduce the issue with my ess already, but so far essmqtt is running smoothly (I started it on may 24th 19:40 pm, so it's already running more than 38 hours). But I used the default 10 seconds poll rate, I'll increase it to your two seconds and see whether that breaks it faster. I will know on thursday whether it crashes for me after 38 hours with your poll rate.
Edit to add: I have a few ideas that I'm planning to try, but fixing the issue would be most convenient if I could reproduce the hanging process locally.
with the automatic restart after 24 hours it seems to work so far... I'll keep an eye on it!
@fu-zhou good to hear you have a workaround for the time being.
I managed to reproduce the hanging process on my system and will now try to fix the problem. I managed to debug into the process but in crashed state I don't find obvious clues why it crashed. Fixing will take some time because it takes a while to crash it. Feel free to poke me every few weeks.
Just updated to 0.1.10 more or less right before I reached 38 hours with 0.1.8 (1 day 12 hours). I meanwhile changed the poll timing to 2 sec (home) and 10 sec (common, divisor=5) as the common values go into the database only for statistics purposes while I try to do somewhat real time calculations and predictions with the home values. On top I got a strange error message from the PCS (once only a couple of days ago when running at 2 and 2 seconds):
P547 SDSP Error Kommunikationsfehler mit
angeschlossenem SDSP länger als
Since the error pointed at communication I reduced the load (divisor=5) however I have no clue what SDSP stands for and sure enough the LG manual doesn't explain it either.
I'm running with a two second interval more than 38 hours now and pyess still submits values. I'll keep checking it but the most recent submit might have fixed the issue.
Unfortunately I also don't know what what SDSP is, the manual is indeed not helpful. I'll report if I observe the same issue. In theory my ess should be pretty occupied, essmqtt is polling it every 2 seconds and on top of that it gets polled by the graphite/carbon bridge every 10 seconds. So maybe that's enough to trigger the error message.
I got a firmware update pushed on my PCS today, so I turned the timing back to 2 and 2 seconds in order to see if anything changed regarding the values in home and common.
First thing I figured out: mqtt.0.ess.common.GRID.active_power
shows positive values only, just like mqtt.0.ess.home.statistics.grid_power
while the values used to turn negative when feeding in. This sucks, hope they change it back again, best would be for both values.
@fu-zhou Thank you for mentioning the firmware upgrade, there was one available for the HOME5 version I use as well. I applied it. Strangely for my HOME5 ess ess/common/GRID/active_power
is still showing power direction via the sign. So there seems to be a discrepancy in behaviour between the 8/10 and 5 ESS models. For reference these are my new version numbers (screenshot from firmware upgrade manual but the Versions match what my ESS reports)
I have not observed any crashes of essmqtt since applying v0.1.10 so I will close this issue in a few days.
Hi @gluap: your SW versions are totally different from the ones on the ESS 8, it seems that there is a major difference in the two systems, however pyess works on both and that is important! One of the differences seems to be that YOU can pick and choose the firmware while on my ESS 8 I have no options or choices, the firmware got pushed onto the system without notification and without documentation what changed and even worse: The firmware introduced even more unwanted behaviours than the sign of the active power. I opened a service case with LG asking them to tell me how to downgrade from the buggy firmware to the working one - no reply yet.
I just checked the status of the essmqtt service and it was (re)started 12 hours ago. I don't remember that I did a reboot are anything, so the Restart=on-failure
worked, I guess. I have no clue what the failure was though, but I currently don't care as it is running. So I still wait for the system to reach 38 hours wit 0.1.10
EDIT: I strongly discourage HOME5 users from upgrading to the newest version manually. After the upgrade my ESS worked fine for a few hours. Then it started reporting "Battery overvoltage" and entered a restart loop because of it. To fix it I tried upgrading to a different firmware version with assistance from the LG hotline, that version claimed no battery and/or AC was connected. To at least get back to a state where the ESS can operate without battery I tried applying the initial firmware update again.
@fu-zhou Getting new firmware pushed forcefully is really bad style. But if it's any consolation: I can't pick and choose either- I found only one new version of the firmware available that could be applied. No older firmware version to go back to and until end of April no firmware updates at all. Also I found no changelog to judge whether the update will be worthwhile or not. The process of upgrading involves copying firmware files onto three USB sticks, sticking them in the usb port in the right order and hoping that you're not accidentally flashing the wrong firmware because there is virtually no documentation on the process. The lights flash differently from what is described in the upgrade manual during the process making me worry that I might have bricked the device while applying the update. So also the manual process is far from optimal.
My main reason for applying the update was that my ESS kept forgetting that it is supposed to "economically" use the battery and almost always charged it full as fast as it could in the morning. Later around noon it would then throw away usable energy to match the 70% feed-in and 5kW AC converter limits instead of charging the battery with the excess energy as it is supposed to. This seems to be fixed in the current FW at first glance. It was working sporadically in the past as well though, so I have to observe it for some time to be sure.
By the way if you want to avoid further forced upgrades you may be able to prevent them by disabling the data upload to enervu in the admin menu. I saw one manual claim that this would also disable automatic updates. (Caveat: I've no Idea whether that affects manufacturer warranty).
To be honest: I eagerly waited for the firmware as it was supposed to fix a couple of unwanted behaviours, which it didn't, it introduced new ones. To be really sure regarding the internet connection, I typically block the PCS in my FritzBox.
Just before I reached 38 hours the service was restarted. Do you know if there is a logfile stored somewhere which contains details what caused the restart? Do you recommend that I take out the restart option from the service in order to get details why the service fails from time to time?
The log is printed out to the console, so it should end up in the systemd logs. Systemd logs from essmqtt can be accessed via journalctl -a|grep essmqtt
. On my system I can still see the logs from a few days ago in there.
Edit: so there is no need to take out the restart option.
Unfortunately I can't leave my ess running for 38 hours at the moment because it enters the restart loop as soon as the battery is full. At that point I have to disable the battery for it to at least feed the grid. I hope to fully drain the battery tonight in the hopes of the system being able to learn the charge curve again if it can observe a full battery cycle. If that doesn't fix it I'm opening a support call tomorrow. I am really annoyed by the buggy firmware.
Man, their quality control sucks! Here's the relevant extract:
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: ERROR:pyess.essmqtt:exception while publishing 3826 to ess/home/statistics/pcs_pv_total_power, please report bug
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: Traceback (most recent call last):
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 38, in recursive_publish_dict
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: await mqtt_client.publish(f"{publish_root}/{key}", value)
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/asyncio_mqtt/client.py", line 77, in publish
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: info = self._client.publish(*args, **kwargs) # [2]
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 1269, in publish
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: rc = self._send_publish(
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 2580, in _send_publish
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: return self._packet_queue(PUBLISH, packet, mid, qos, info)
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 2927, in _packet_queue
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: self._sockpairW.send(sockpair_data)
Jun 07 13:35:39 iobroker-vm essmqtt[16903]: TimeoutError: [Errno 110] Connection timed out
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: Traceback (most recent call last):
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 202, in _main
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: await send_loop(ess, client, once=args.once, interval_seconds=args.interval_seconds,
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 58, in send_loop
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: await recursive_publish_dict(mqtt_client, "ess/home", home)
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 35, in recursive_publish_dict
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: await recursive_publish_dict(mqtt_client, f"{publish_root}/{key}", value)
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 38, in recursive_publish_dict
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: await mqtt_client.publish(f"{publish_root}/{key}", value)
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/asyncio_mqtt/client.py", line 77, in publish
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: info = self._client.publish(*args, **kwargs) # [2]
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 1269, in publish
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: rc = self._send_publish(
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 2580, in _send_publish
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: return self._packet_queue(PUBLISH, packet, mid, qos, info)
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 2927, in _packet_queue
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: self._sockpairW.send(sockpair_data)
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: TimeoutError: [Errno 110] Connection timed out
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: During handling of the above exception, another exception occurred:
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: Traceback (most recent call last):
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/bin/essmqtt", line 8, in <module>
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: sys.exit(main())
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 115, in main
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: asyncio.run(_main(arguments))
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: return loop.run_until_complete(main)
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: return future.result()
Jun 07 13:35:40 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/pyess/essmqtt.py", line 202, in _main
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: await send_loop(ess, client, once=args.once, interval_seconds=args.interval_seconds,
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/asyncio_mqtt/client.py", line 236, in __aexit__
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: await self.disconnect()
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/asyncio_mqtt/client.py", line 51, in disconnect
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: self._client.disconnect()
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 1370, in disconnect
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: return self._send_disconnect(reasoncode, properties)
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 2740, in _send_disconnect
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: return self._packet_queue(command, packet, 0, 0)
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: File "/home/user/essmqtt/lib/python3.8/site-packages/paho/mqtt/client.py", line 2927, in _packet_queue
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: self._sockpairW.send(sockpair_data)
Jun 07 13:35:41 iobroker-vm essmqtt[16903]: BrokenPipeError: [Errno 32] Broken pipe
Jun 07 13:35:44 iobroker-vm essmqtt[16903]: /home/user/essmqtt/lib/python3.8/site-packages/pyess/aio_ess.py:205: RuntimeWarning: coroutine 'ClientSession.close' was never awaited
Jun 07 13:35:44 iobroker-vm essmqtt[16903]: ERROR:asyncio:Unclosed client session
Jun 07 13:35:44 iobroker-vm essmqtt[16903]: client_session: <aiohttp.client.ClientSession object at 0x7fd248267ca0>
Jun 07 13:35:44 iobroker-vm essmqtt[16903]: ERROR:asyncio:Unclosed connector
Jun 07 13:35:44 iobroker-vm essmqtt[16903]: connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7fd247183040>, 354421.876384167)]']
Jun 07 13:35:44 iobroker-vm essmqtt[16903]: connector: <aiohttp.connector.TCPConnector object at 0x7fd248267d00>
Jun 07 13:35:45 iobroker-vm systemd[1]: essmqtt.service: Main process exited, code=exited, status=1/FAILURE
Jun 07 13:35:45 iobroker-vm systemd[1]: essmqtt.service: Failed with result 'exit-code'.
Jun 07 13:35:55 iobroker-vm systemd[1]: essmqtt.service: Service hold-off time over, scheduling restart.
Jun 07 13:35:55 iobroker-vm systemd[1]: essmqtt.service: Scheduled restart job, restart counter is at 17.
Jun 07 13:36:01 iobroker-vm essmqtt[131001]: INFO:pyess.aio_ess:fetching auth key
Jun 07 13:36:02 iobroker-vm essmqtt[131001]: INFO:pyess.essmqtt:starting send loop
Can you read anything from that? This sequence shows up multiple times over the last days, I have attached an ASCII file generated with journalctl -a|grep essmqtt > essmqtt.service.log
essmqtt.service.log
In your longer log I see two different errors - One is the one that you also pasted above, the other seems to indicate that some time around 0:47 or so the mqtt server was down. I have changed the error handling such that whenever mqtt is down the whole communication is re-initiated. It should now be able to deal with both errors.
Nevertheless I'm not sure whether it wouldn't strictly be more correct to just exit with an error code when one remote party doesn't respond and leave it to systemd to decide whether or not communication should be re-initiated.
The MQTT Server can be down for a couple of minutes once per day as a system backup is starting at 02:00 AM in the morning, but not 0:47, that was probably indeed a system fault or network error and the essmqtt service did the right thing: it restarted and the communication didn't stop while having the service running (the 38 hours behaviour you fixed). Anyways I would follow your recommendation regarding error handling and relying on system functions rather than trying to build work arounds. I mean, again, the service restarted and kept the communication up.
Small update: essmqtt didn't restart automatically yet, runtime is 1 day and 19 hours, however the communication was re-established after 36 hours and 50 minutes, message-log is attached. To prevent misunderstandings: I posted the log in case it shows valuable hints for you, not to build a work around ;-) 19563.log
@fu-zhou great, the log indicates that the automatic reconnect added with 0.1.11 works. The code is printing out a stacktrace to keep information on where a problem is happening, but the fix I added on sunday fixes the issue by starting over without crashing, so in this case the stacktrace and especieally the warning messages around it are a good sign: The broken connection detection works now.
The current approach ist to automatically reastart on all connection errors that we can be caused by known by MQTT or ESS connection errors. When an unknown error type is encountered essmqtt will still exit and leave the mess to be dealt with by systemd.
I will close this issue now, thank you again for your extensive testing!
I'm still running essmqtt at a 2 seconds update cycle for both "common" and "home". 3rd day in a row essmqtt stops updating the values after 38 hours of runtime (1 day, 14 hours). The MQTT server keeps on running as far as I can tell and restarting the server doesn't fix the problem. The service needs to be restarted (
systemctl restart essmqtt.service
). Immediatley after restarting the values are being updated in the server.systemctl status essmqtt.service
doesn't deliver any hint regarding the issue.: