CJNE / ha-sunspec

Home Assistant customcomponent for SunSpec modbus devices
MIT License
63 stars 14 forks source link

unavailable inverter, slows down HA expierience #165

Open shurli opened 1 year ago

shurli commented 1 year ago

Version of the custom_component

0.0.22 on HA 2023.2.3

Describe the bug

I use the sunspec integration to get the values of 4 fronius inverter. 3 of them are connected via lan cable, 1 via wifi. the wifi connection is terrible so it is lost sometimes.

since I installed the integration HA sometimes was laggy or even sometimes for a short period of time unavailable. But I was not able to figure out the reason.

Yesterday i disabled the wifi connection, so the 1 inverter is unavailable since then. HA user experience is worse.

now I disabled the 1 unavailable inverter in the integration settings, and response of HA is normal again.

For me it seems that the tries to reach the unavailable inverter, slows down HA, leads to unavailibilty of the UI and even to restarts of HA.

Maybe it floods the max number of tcp connection or something?

Claudio1L commented 1 year ago

same here. Very bad experience after the sun goes down and the inverter shut down.

Any advice on how to solve? Thanks

goberhammer commented 1 year ago

I can also confirm this: when the sun goes down my inverter shuts off my HA instance becomes almost unusable, taking forever to connect to and to respond. Disabling the component solve the issue, but it's cumbersome to disable it every night and re-enable in the morning.

janoschbatschi commented 1 year ago

Same here. Also my HA instance restarts on a regular basis (every few minutes) when the inverter (Fronius Symo Gen24) is not reachable (meaning every evening).

shurli commented 1 year ago

Same here. Also my HA instance restarts on a regular basis (every few minutes) when the inverter (Fronius Symo Gen24) is not reachable (meaning every evening).

In fronius symo gen24 there is a setting, so that it stays alive after sun goes down.

codyc1515 commented 1 year ago

I found the issue. Hopefully @CJNE can take a look at it. For reference, when this happens HA blocks for such a long time it requires restarting. Note the integration blocks HA, even if it is disabled entirely. I could not recommend using this integration under any circumstance until this issue is fixed due to blocking your entire HA instance.

The issue is easy to reproduce - just take your inverter offline.

Logger: homeassistant.util.async_
Source: util/async_.py:164
First occurred: 21:44:31 (3 occurrences)
Last logged: 21:51:15

Detected blocking call to sleep inside the event loop. This is causing stability issues. Please report issue to the custom integration author for sunspec doing blocking calls at custom_components/sunspec/api.py, line 159: client.scan(

https://github.com/CJNE/ha-sunspec/blob/a40293a77fc40c19160952094bfb944c66f0bd03/custom_components/sunspec/api.py#L159

goberhammer commented 1 year ago

Thanks @codyc1515 for this precise information. I've started looking into it, but it's more complex than I expected. I'm familiar with PyModbus, but the code uses the class ModbusClientDeviceTCP from PySunspec which has a custom modbus TCP client implementation. Maybe we can wrap the blocking connect call into an async call, but I'm not familiar enough with the HA APIs to tell.

alexdelprete commented 1 year ago

Note the integration blocks HA, even if it is disabled entirely

So the integration is disabled, but it is running? Could you clarify please? If an integration is disabled, none of its code is running.

Or did you mean you disabled it after it "locked"?

codyc1515 commented 1 year ago

I can’t really explain this but even the startup on HA will be stuck on waiting for sunspec to be loaded (even if the configuration entry is disabled).

alexdelprete commented 1 year ago

I can’t really explain this but even the startup on HA will be stuck on waiting for sunspec to be loaded (even if the configuration entry is disabled).

That is not technically possible: if an integration is disabled, it's not even loaded, so its code is not running. I think you are probably confusing different issues.

Enable debug log and when the integration is disabled, check the full HA logs: you shouldn't see any entry regarding ha-sunspec. If you do, then the integration is enabled.

I used ha-sunspec in the past, before developing my custom component for ABB inverters, and when the inverter went off, I didn't have any issue. But that log error you reported is indeed a possible problem that can be overcome making it asynchronous, but I'm sure @CJNE will find the best solution. When I started developing my component, he helped me a lot since I'm not a real dev and I just wanted to learn a little bit python and how HA component worked, he's a skillful developer and very kind person. My custom component is structured based on ha-sunspec, even though the specific code is different (it uses pymodbus).

goberhammer commented 1 year ago

Hi @alexdelprete , I see you've developed your custom integration for ABB/Sunspec. I would like to try to switch from ha-sunspec to yours, but I've two questions:

  1. is the ABB/Fimer with integrated wifi card UNO-DM-3.0-TL-PLUS supported ?
  2. do you think is possible to keep all the historic energy and data consumption when migrating ? Thanks in advance.
alexdelprete commented 1 year ago
  1. is the ABB/Fimer with integrated wifi card UNO-DM-3.0-TL-PLUS supported ?

Your card should be a VSN300, you can check it yourself, it has an embedded webserver, and it is supported.

2. do you think is possible to keep all the historic energy and data consumption when migrating ?

You are not migrating, you're installing a new component, there's no data migration in place. But if you fed the Energy integration with the correct sensors, you have long-term data in the statistics anyway. It depends on what is the historical data that you want to keep.

Take also into account that in my integration there's no support for integrated meters, batteries, etc. it's just the basic inverter stuff. If you have those extra modules, I would advice to stay with ha-sunspec.

If you want to simply take a look: just disable ha-sunspec, install mine, and check sensors, etc.

goberhammer commented 1 year ago

If you want to simply take a look: just disable ha-sunspec, install mine, and check sensors, etc.

Thanks a lot, will do.

goberhammer commented 12 months ago

For anyone still struggling with this bug, I've tested the custom integration made by @alexdelprete ( https://github.com/alexdelprete/ha-abb-powerone-pvi-sunspec ) and I can confirm that it woks perfectly with my ABB/Fimer inverter with built-in wifi adapter board.

Configuration: Model: Fimer (ex ABB) UNO-DM-3.0-TL-PLUS Firmware: 2201A Port: 502 Modbus address: 1 Register base: 40000

This integration does not manifest any problem when the inverter shuts-down at dusk. Thanks Alex for your work.

alexdelprete commented 12 months ago

This integration does not manifest any problem when the inverter shuts-down at dusk.

What problems were you having with ha-sunspec? My component is very similar to it, the dev helped me starting the development of my component. :)

goberhammer commented 12 months ago

The problem described in the original bug: some inverters, like mine, are directly powered by the array, so when the sun goes down the inverter shuts off. The original ha-sunspec integration then starts to block the main HA loop, making it impossible to work (dashboards loading and working only for few seconds every few minute - probably correlated to the connect timeout -, automations delayed, etc). As another user noted in https://github.com/CJNE/ha-sunspec/issues/165#issuecomment-1740546172 the problem is due to blocking calls. Your extension instead works correctly when the inverter is not reachable. I've tried to debug/fix it myself, but I'm not experienced enough with the HA internal API.

alexdelprete commented 12 months ago

some inverters, like mine, are directly powered by the array, so when the sun goes down the inverter shuts off. The original ha-sunspec integration then starts to block the main HA loop, making it impossible to work (dashboards loading and working only for few seconds every few minute - probably correlated to the connect timeout -, automations delayed, etc).

My inverter does the same. But I never had issues. Some time ago some users (two actually) of my component reported that it was slowing their HA when inverter was off. Initially I didn't pay much attention to that. But I had an issue with pymodbus library that was chatting too much when connect failed, and couldn't turn off the logging no matter what I did, so I decided to take a look at how I could solve both issues.

First, I made the ModbusTCP timeout dependent on scan_interval when I found this issue (it's a good component I learned a lot from while developing mine). I'm not sure that really changed much, but it seemed like a good idea anyway. I also added a thread safe lock for connect/disconnect/read functions, in case some strange racing condition happened during those calls.

        """Initialize the Modbus hub"""
        self._hass = hass
        self._name = name
        self._host = host
        self._port = port
        self._slave_id = slave_id
        self._base_addr = base_addr
        self._scan_interval = scan_interval
        # Min. scan_interval is 30s, ensure min. timeout is 29s
        self._timeout = max(29, (scan_interval - 1))
        self._client = ModbusTcpClient(host=self._host, port=self._port, timeout=self._timeout)
        self._lock = threading.Lock()

But the main thing I modified is that before doing a ModbusTCP connect to the inverter, since it was throwing out errors despite having disabled all logging, I decided to first check if the socket was available (so if the inverter was ready for connection). So I implemented a check_port() method that is verified before ANY connect, so the pymodbus library wouldn't throw connection errors anymore.

    def check_port(self) -> bool:
        """Check if port is available"""
        with self._lock:
            sock_timeout = float(3)
            _LOGGER.debug(f"Check_Port: opening socket on {self._host}:{self._port} with a {sock_timeout}s timeout.")
            socket.setdefaulttimeout(sock_timeout)
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock_res = sock.connect_ex((self._host, self._port))
            is_open = sock_res == 0  # True if open, False if not
            if is_open:
                sock.shutdown(socket.SHUT_RDWR)
                _LOGGER.debug(f"Check_Port (SUCCESS): port open on {self._host}:{self._port}")
            else:
                _LOGGER.debug(f"Check_Port (ERROR): port not available on {self._host}:{self._port} - error: {sock_res}")
            sock.close()
        return is_open

Here's the connect() method, verifying check_port() and managing the thread-safe lock:

    def connect(self):
        """Connect client"""
        _LOGGER.debug(
            f"Hub connect to IP: {self._host} port: {self._port} slave id: {self._slave_id} timeout: {self._timeout}"
        )
        if self.check_port():
            _LOGGER.debug("Inverter ready for Modbus TCP connection")
            try:
                with self._lock:
                    self._client.connect()
                if not self._client.connected:
                    raise ConnectionError(
                        f"Failed to connect to {self._host}:{self._port} slave id {self._slave_id} timeout: {self._timeout}"
                    )
                else:
                    _LOGGER.debug("Modbus TCP Client connected")
                    return True
            except ModbusException:
                raise ConnectionError(
                    f"Failed to connect to {self._host}:{self._port} slave id {self._slave_id} timeout: {self._timeout}"
                )
        else:
            _LOGGER.debug("Inverter not ready for Modbus TCP connection")
            raise ConnectionError(
                f"Inverter not active on {self._host}:{self._port}"
            )

The users that were complaining told me last version is working fine, I can't tell you exactly which of the changes solved the problem, but I think avoiding the connection when the inverter is not ready was the key, but I'm not 100% sure. Maybe the lock also helped.

ha-sunspec uses the pysunspec2 library, I don't really know if it's based on pymodbus but maybe avoiding connections and implementing the thread lock could be helpful.

Like I said, Johan is a good developer and he helped me a lot when I started developing mine (I'm not a dev), I'm sure he'll solve it in some way.

Good luck. :)

goberhammer commented 11 months ago

First of all... thanks a lot for taking the time to write this :-) As far as I can tell, ha-sunspec does not use pymodbus but a custom module called client.py in pysunspec. I was planning to disable the extension in my main docker container and put up another homeassitant docker container with only ha-sunspec for debugging, but I've not had the time to do it yet.

Like I said, Johan is a good developer and he helped me a lot when I started developing mine (I'm not a dev), I'm sure he'll solve it in some way.

I apologize if there was any misunderstanding, was never my intention to criticize Johan, my intention was to update this thread with a possible solution if other people stumble here with the same problem :-D

alexdelprete commented 11 months ago

As far as I can tell, ha-sunspec does not use pymodbus but a custom module called client.py in pysunspec.

If you check my post, towards the end, I said that this integration uses pysunspec2 library. :) Someone actually told me that pysunspec is based on an old version of pymodbus, but I don't know if that is true.

I apologize if there was any misunderstanding, was never my intention to criticize Johan

No worries, I didn't mean to say you offended anyone, I just wanted to make a clear statement about the developer, who is not only good but also a very kind person. :)

Ciao,

Alessandro

dragonnn commented 11 months ago

Just got the same issue, my inverter did get stuck on something and stopped responding to anything and refreshing the solaredge app too. Restart fixed it and I changed the TIMEOUT in ha-sunspec to 5, since my connection is over LAN 5s should be way enough and as far I understand that should workaround the blocking to some extend.

gszigethy commented 9 months ago

This is impacting me too. When my Fronius inverter goes unreachable for whatever reason, the SunSpec integration kills home assistance and puts home assistant to an infinite reboot loop, which goes on until the inverter is reachable again.

This happened every in the past few days which basically makes it unusable.

CJNE commented 2 months ago

I'm very sorry for the lack of updates, i haven't had the time to do anything much besides work for a long time. I'm currently testing the connection probing suggested by @alexdelprete (thanks for the excellect suggestion!), i think it should help with this problem, i just need to keep it running for a while to see how it behaves, it's looking promising though :)

A new version will be released soon with this fix included.

CJNE commented 2 months ago

Version 0.0.26 should hopefully fix this, please give it a try and let me know if you still have any issues.