WillCodeForCats / solaredge-modbus-multi

Modbus/TCP local polling to one or more SolarEdge inverters. For single inverters, multiple inverters, meters, and batteries.
Apache License 2.0
188 stars 16 forks source link

Diagnosing pymodbus connection issues #514

Closed thargy closed 5 months ago

thargy commented 9 months ago

Describe the feature Allow easy access to pymodbus debug logs in HAOS. The documentation notes a 'download diagnostics' option that no longer appears.

Home Assistant (please complete the following information):

Additional context I am talking to SolarEdge to diagnose issues with multi-inverter site's new home backup functionality - which isn't currently working or supported. (Apparently it's setting parameter 1404 which can be changed from 1 to 2 to indicate Multi-Inverter Backup Mode).

However, when they made that change on my system this TCP modbus integration failed. Here is the current logging with debug enabled:

2024-01-23 16:44:19.606 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] solaredge_modbus_multi configuration: number_of_inverters=2, start_device_id=1, detect_meters=True, detect_batteries=True, detect_extras=True, keep_modbus_open=False, adv_storage_control=True, adv_site_limit_control=True, allow_battery_energy_reset=False, sleep_after_write=3, battery_rating_adjust=0, 
2024-01-23 16:44:19.606 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] coordinator timeout is 31.2
2024-01-23 16:44:19.610 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 16:44:19.610 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 16:44:31.617 ERROR (MainThread) [custom_components.solaredge_modbus_multi] Unexpected error fetching SolarEdge Coordinator data: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 194, in _async_update_data
    return await self._refresh_modbus_data_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 232, in _refresh_modbus_data_with_retry
    raise ex
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 229, in _refresh_modbus_data_with_retry
    return await self._hub.async_refresh_modbus_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 319, in async_refresh_modbus_data
    await self._async_init_solaredge()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 209, in _async_init_solaredge
    await new_inverter.init_device()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 671, in init_device
    inverter_data = await self.hub.modbus_read_holding_registers(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 414, in modbus_read_holding_registers
    result = await self._client.read_holding_registers(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymodbus/client/base.py", line 207, in async_execute
    raise ModbusIOException(
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
2024-01-23 16:44:31.622 DEBUG (MainThread) [custom_components.solaredge_modbus_multi] Finished fetching SolarEdge Coordinator data in 12.016 seconds (success: False)

I would like an easy way to enable debug logging on pymodbus and download logs so that I can see what communications are occurring to hopefully help SolarEdge diagnose the issue. Obviously they don't officially support Pymodbus or this integration so I would like to provide as much useful information to them to hopefully get this working again.

I looked up pymodbus and the debug logs show raw frame data which I feel could be invaluable - i.e. it would be great to show SolarEdge what data is sent on the modbus and to see if there's some kind of unexpected response. Obviously, I'd share the logs here too.

Appreciate your help.

WillCodeForCats commented 9 months ago

configuration.yaml

logger:
    logs:
        pymodbus: debug
thargy commented 9 months ago

Firstly, I apologise for being so lazy. I thought that might work, but I'm so stressed talking to SolarEdge and trying to resolve things I panicked.

As promised, I got the following logs:

2024-01-23 17:37:14.112 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-23 17:37:14.112 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-23 17:37:14.130 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-23 17:37:14.130 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-23 17:37:14.147 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 17:37:14.147 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 17:37:14.147 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:14.147 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:17.161 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:17.161 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:20.162 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:20.163 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:23.166 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:23.167 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:26.170 ERROR (MainThread) [custom_components.solaredge_modbus_multi] Unexpected error fetching SolarEdge Coordinator data: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 194, in _async_update_data
    return await self._refresh_modbus_data_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 232, in _refresh_modbus_data_with_retry
    raise ex
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 229, in _refresh_modbus_data_with_retry
    return await self._hub.async_refresh_modbus_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 319, in async_refresh_modbus_data
    await self._async_init_solaredge()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 209, in _async_init_solaredge
    await new_inverter.init_device()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 671, in init_device
    inverter_data = await self.hub.modbus_read_holding_registers(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 414, in modbus_read_holding_registers
    result = await self._client.read_holding_registers(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymodbus/client/base.py", line 207, in async_execute
    raise ModbusIOException(
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
2024-01-23 17:37:26.777 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-23 17:37:26.777 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-23 17:37:26.844 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-23 17:37:26.844 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-23 17:37:26.856 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 17:37:26.856 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 17:37:26.856 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:26.856 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:28.170 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to [Errno 104] Connection reset by peer
2024-01-23 17:37:28.170 DEBUG (MainThread) [pymodbus.logging] Getting transaction 1
2024-01-23 17:37:38.617 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-23 17:37:38.617 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-23 17:37:38.619 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-23 17:37:38.619 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-23 17:37:38.619 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 17:37:38.619 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 17:37:38.619 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:38.620 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:41.621 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:41.623 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:44.624 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:44.625 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:47.627 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:37:47.627 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:37:50.628 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to Server not responding
2024-01-23 17:37:50.629 DEBUG (MainThread) [pymodbus.logging] Getting transaction 1
2024-01-23 17:37:50.629 ERROR (MainThread) [custom_components.solaredge_modbus_multi] Unexpected error fetching SolarEdge Coordinator data: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 194, in _async_update_data
    return await self._refresh_modbus_data_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 232, in _refresh_modbus_data_with_retry
    raise ex
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 229, in _refresh_modbus_data_with_retry
    return await self._hub.async_refresh_modbus_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 319, in async_refresh_modbus_data
    await self._async_init_solaredge()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 209, in _async_init_solaredge
    await new_inverter.init_device()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 671, in init_device
    inverter_data = await self.hub.modbus_read_holding_registers(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 414, in modbus_read_holding_registers
    result = await self._client.read_holding_registers(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymodbus/client/base.py", line 207, in async_execute
    raise ModbusIOException(
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
2024-01-23 17:38:10.831 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-23 17:38:10.831 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-23 17:38:10.833 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-23 17:38:10.834 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-23 17:38:10.834 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 17:38:10.834 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 17:38:10.834 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:38:10.835 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:38:13.837 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:38:13.837 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:38:16.839 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:38:16.839 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:38:19.841 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:38:19.841 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:38:22.842 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to Server not responding
2024-01-23 17:38:22.843 DEBUG (MainThread) [pymodbus.logging] Getting transaction 1
2024-01-23 17:38:22.843 ERROR (MainThread) [custom_components.solaredge_modbus_multi] Unexpected error fetching SolarEdge Coordinator data: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 194, in _async_update_data
    return await self._refresh_modbus_data_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 232, in _refresh_modbus_data_with_retry
    raise ex
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 229, in _refresh_modbus_data_with_retry
    return await self._hub.async_refresh_modbus_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 319, in async_refresh_modbus_data
    await self._async_init_solaredge()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 209, in _async_init_solaredge
    await new_inverter.init_device()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 671, in init_device
    inverter_data = await self.hub.modbus_read_holding_registers(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 414, in modbus_read_holding_registers
    result = await self._client.read_holding_registers(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymodbus/client/base.py", line 207, in async_execute
    raise ModbusIOException(
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
2024-01-23 17:39:03.004 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-23 17:39:03.004 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-23 17:39:03.007 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-23 17:39:03.007 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-23 17:39:03.007 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 17:39:03.008 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 17:39:03.008 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:39:03.008 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:39:06.010 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:39:06.011 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:39:09.013 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:39:09.014 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:39:12.015 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 17:39:12.015 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 17:39:15.017 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to Server not responding
2024-01-23 17:39:15.017 DEBUG (MainThread) [pymodbus.logging] Getting transaction 1
2024-01-23 17:39:15.017 ERROR (MainThread) [custom_components.solaredge_modbus_multi] Unexpected error fetching SolarEdge Coordinator data: Modbus Error: [Input/Output] ERROR: No response received after 3 retries
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 194, in _async_update_data
    return await self._refresh_modbus_data_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 232, in _refresh_modbus_data_with_retry
    raise ex
  File "/config/custom_components/solaredge_modbus_multi/__init__.py", line 229, in _refresh_modbus_data_with_retry
    return await self._hub.async_refresh_modbus_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 319, in async_refresh_modbus_data
    await self._async_init_solaredge()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 209, in _async_init_solaredge
    await new_inverter.init_device()
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 671, in init_device
    inverter_data = await self.hub.modbus_read_holding_registers(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/solaredge_modbus_multi/hub.py", line 414, in modbus_read_holding_registers
    result = await self._client.read_holding_registers(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pymodbus/client/base.py", line 207, in async_execute
    raise ModbusIOException(
pymodbus.exceptions.ModbusIOException: Modbus Error: [Input/Output] ERROR: No response received after 3 retries

Which are illuminating. They confirm my telnet test that the port is open but there's nothing coming back.

I found this line interesting:

2024-01-23 17:37:28.170 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to [Errno 104] Connection reset by peer

But it only occurred once.

Is there any other illumination you can provide before I pass on my interpretation to SolarEdge. Again, I'm conscious this is not formally supported by them so the more insight I can pass on to them the more likely there are to be able to help rectify the issue.

To be clear, the communications were working fine until SolarEdge made some parameter changes to my inverters, so I strongly suspect the issue is at their end (particular as the time the communication stopped coincides with the time they made the changes).

WillCodeForCats commented 9 months ago

I would try giving the inverter a cold boot if you haven't already done that. Turn the red switch to 0, wait for it to shut down, then open the DC disconnect, then open the AC disconnect. The three light tower should turn off. Wait 30 seconds then turn it back on starting with the AC disconnect.

thargy commented 9 months ago

Yeah, we tried resetting the new mode flag followed by a comms board reboot; and I previously tried a cold boot whilst in the new mode. I've suggested the next step is to try changing the new mode back and doing a cold boot - but they have to make the change first and will want to watch what happens. I'll bounce the log back to them now.

WillCodeForCats commented 9 months ago

This might also be a connection race from an uncaught exception during setup; I've corrected that in PR #515 and I'll push a pre release shortly for you to try.

WillCodeForCats commented 9 months ago

Release v2.4.9-pre.4 contains the change.

thargy commented 9 months ago

OK, I'll hold on to try that first. I don't want it to be our issue as I've got a really good line to their engineers right now, and they're interested in helping resolve issues, so I don't want to lose that goodwill. I'll keep you informed of any info I get as I'm quite a complex site for them, and they're interested in debugging some of their new functionality that is currently being extended to multi-inverter sites. That includes proper HomeBackup support for multi-inverter multi-battery sites, that I tested today and allowed both my inverters to power the home in a full backup scenario (grid off), I was running ~9kW of load with no grid connection.

WillCodeForCats commented 9 months ago

Basically what wasn't happening is when an IO error occurred during setup it wasn't calling disconnect. It would then loop back around holding onto a possibly dead connection. What's supposed to happen if we get an IO error is force a disconnect and open a new connection on the next retry.

Of course the inverter could still not be responding, but at least I can say for sure it's not because an old connection is being held onto.

If they happen to tell you about any modbus registers that aren't documented, let me know and I can add them to the integration.

thargy commented 9 months ago

OK, I ran using that and I didn't see any ModbusIOException in the output.

2024-01-23 18:09:33.247 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-23 18:09:33.247 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-23 18:09:33.279 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-23 18:09:33.279 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-23 18:09:33.287 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Storage Control is enabled. Use at your own risk!
2024-01-23 18:09:33.288 WARNING (MainThread) [custom_components.solaredge_modbus_multi.hub] Power Control Options: Site Limit Control is enabled. Use at your own risk!
2024-01-23 18:09:33.288 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 18:09:33.288 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 18:09:36.290 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 18:09:36.291 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 18:09:39.297 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 18:09:39.297 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 18:09:42.298 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x1 0x3 0x9c 0x40 0x0 0x45
2024-01-23 18:09:42.299 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-23 18:09:45.300 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to Server not responding
2024-01-23 18:09:45.301 DEBUG (MainThread) [pymodbus.logging] Getting transaction 1
2024-01-23 18:09:55.620 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
...

As you can see, you are now correctly catching the error and reconnecting. I recommend you dump the error to the log as an error as well as raising the HubInitFailed exception as that doesn't appear in the logs and is easy to miss (I only see it when I hover over the hover failure message in the hub).

However, I'm still not getting a response.

The only new register may be the Multi-Inverter Backup Mode which I think I heard 1404 for, with 1 indicating existing functionality (off/disabled maybe?) and 2 enabling the mode (on/enbaled?). I can't be 100% sure though. If you want to add it as a beta, I can try and spot if it changes when they make changes? If it reports as 2 right now then I guess that's a pretty good bet.

WillCodeForCats commented 9 months ago

During setup when the configuration is first loaded (or reloaded), Home Assistant will turn all errors into ConfigEntryNotReady which signals HA that setup failed for that integration and to try again later.

Older versions of Home Assistant would show this error more prominently in the integration panel, but at some point it was decided to hide it the hover tooltip. I don't remember if it would also log or not, but it was a decision from the HA core team to make setup errors more hidden.

WillCodeForCats commented 9 months ago

I checked the core code, they are disabling failure logging at that stage:

https://github.com/home-assistant/core/blob/dev/homeassistant/helpers/update_coordinator.py#L279

thargy commented 9 months ago

Yeah, it doesn't also log it. I appreciate that is the HA Devs choice. Is it considered anti-pattern to also put the error in the log? Personally, I'd put the full error in the log and put a friendly error in the Hub like 'A modbus communication error occurred'. However it's whatever you prefer.

If you want to put a feature branch with an option for the new mode, rather than a beta that would affect everyone, I can patch my code locally?

WillCodeForCats commented 9 months ago

I would need to know the modbus register address, size, and type. 1404 is way outside the holding register range that starts at 40000.

thargy commented 9 months ago

Yeah I’m just starting to read up on the Sun Spec. I’ll see if I can get more info, but I’m not hugely optimistic. Ofc I’ll share it if I do.

thargy commented 9 months ago

The plot thickens, I sent raw data to the port and could see that it responded with zero-length data. I changed the unit identifier to '02' and voila, I got data back. Checking the SetApp, sure enough their engineers have changed the device ID from 1 to 2.

However, I can't find how to easily change the Device ID/IP Address/Port of the existing hub? I've seen a repair ticket appear after many hours which allows you do that, but there's nowhere I can find to do it now.

The configuration menu does not expose those settings, and I can't temporarily disable the existing hub and add a new one to test. I have a large system and am loathe to delete the existing hub and have to go through and ensure all the entity ids are correct later if I can just change the Device ID somewhere.

Possibly I'm being blind, but if you can point me in the right direction that would be great.

thargy commented 9 months ago

OK, I just deleted the hub and added a new one with Device ID 2. The setup still fails, but I'm getting a lot more logs now:

2024-01-24 09:40:19.205 DEBUG (MainThread) [pymodbus.logging] Connecting to 192.168.89.177:1502.
2024-01-24 09:40:19.205 DEBUG (MainThread) [pymodbus.logging] Connecting comm
2024-01-24 09:40:19.208 DEBUG (MainThread) [pymodbus.logging] Connected to comm
2024-01-24 09:40:19.208 DEBUG (MainThread) [pymodbus.logging] callback_connected called
2024-01-24 09:40:19.208 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x1 0x0 0x0 0x0 0x6 0x2 0x3 0x9c 0x40 0x0 0x45
2024-01-24 09:40:19.209 DEBUG (MainThread) [pymodbus.logging] Adding transaction 1
2024-01-24 09:40:19.238 DEBUG (MainThread) [pymodbus.logging] recv: 0x0 0x1 0x0 0x0 0x0 0x8d 0x2 0x3 0x8a 0x53 0x75 0x6e 0x53 0x0 0x1 0x0 0x41 0x53 0x6f 0x6c 0x61 0x72 0x45 0x64 0x67 0x65 0x20 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x53 0x45 0x35 0x30 0x30 0x30 0x48 0x2d 0x52 0x57 0x42 0x4d 0x4e 0x42 0x46 0x35 0x34 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x30 0x30 0x30 0x34 0x2e 0x30 0x30 0x31 0x39 0x2e 0x30 0x30 0x33 0x39 0x0 0x0 0x37 0x35 0x30 0x32 0x42 0x32 0x46 0x41 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x2 old_data:  addr=None
2024-01-24 09:40:19.239 DEBUG (MainThread) [pymodbus.logging] Processing: 0x0 0x1 0x0 0x0 0x0 0x8d 0x2 0x3 0x8a 0x53 0x75 0x6e 0x53 0x0 0x1 0x0 0x41 0x53 0x6f 0x6c 0x61 0x72 0x45 0x64 0x67 0x65 0x20 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x53 0x45 0x35 0x30 0x30 0x30 0x48 0x2d 0x52 0x57 0x42 0x4d 0x4e 0x42 0x46 0x35 0x34 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x30 0x30 0x30 0x34 0x2e 0x30 0x30 0x31 0x39 0x2e 0x30 0x30 0x33 0x39 0x0 0x0 0x37 0x35 0x30 0x32 0x42 0x32 0x46 0x41 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x2
2024-01-24 09:40:19.239 DEBUG (MainThread) [pymodbus.logging] Factory Response[ReadHoldingRegistersResponse': 3]
2024-01-24 09:40:19.239 DEBUG (MainThread) [pymodbus.logging] Getting transaction 1
2024-01-24 09:40:19.239 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] Registers received requested : 69 69
2024-01-24 09:40:19.239 DEBUG (MainThread) [pymodbus.logging] [21365, 28243, 1, 65, 21359, 27745, 29253, 25703, 25888, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21317, 13616, 12336, 18477, 21079, 16973, 20034, 17973, 13312, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12336, 12340, 11824, 12337, 14638, 12336, 13113, 0, 14133, 12338, 16946, 17985, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2]
2024-01-24 09:40:19.240 DEBUG (MainThread) [pymodbus.logging] handle: [b'Su', b'nS']
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_SunSpec_ID 0x53756e53<class 'int'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_SunSpec_DID 0x1<class 'int'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_SunSpec_Length 0x41<class 'int'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_Manufacturer SolarEdge<class 'str'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_Model SE5000H-RWBMNBF54<class 'str'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_Option <class 'str'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_Version 0004.0019.0039<class 'str'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_SerialNumber 7502B2FA<class 'str'>
2024-01-24 09:40:19.240 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2: C_Device_address 0x2<class 'int'>
2024-01-24 09:40:19.241 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x2 0x0 0x0 0x0 0x6 0x2 0x3 0x9c 0xb9 0x0 0x9
2024-01-24 09:40:19.241 DEBUG (MainThread) [pymodbus.logging] Adding transaction 2
2024-01-24 09:40:19.289 DEBUG (MainThread) [pymodbus.logging] recv: 0x0 0x2 0x0 0x0 0x0 0x15 0x2 0x3 0x12 0x0 0x1 0x0 0x41 0x57 0x61 0x74 0x74 0x4e 0x6f 0x64 0x65 0x0 0x0 0x0 0x0 0x0 0x0 old_data:  addr=None
2024-01-24 09:40:19.290 DEBUG (MainThread) [pymodbus.logging] Processing: 0x0 0x2 0x0 0x0 0x0 0x15 0x2 0x3 0x12 0x0 0x1 0x0 0x41 0x57 0x61 0x74 0x74 0x4e 0x6f 0x64 0x65 0x0 0x0 0x0 0x0 0x0 0x0
2024-01-24 09:40:19.290 DEBUG (MainThread) [pymodbus.logging] Factory Response[ReadHoldingRegistersResponse': 3]
2024-01-24 09:40:19.290 DEBUG (MainThread) [pymodbus.logging] Getting transaction 2
2024-01-24 09:40:19.290 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] Registers received requested : 9 9
2024-01-24 09:40:19.290 DEBUG (MainThread) [pymodbus.logging] [1, 65, 22369, 29812, 20079, 25701, 0, 0, 0]
2024-01-24 09:40:19.290 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2 MMPPT: mmppt_DID 0x1 <class 'int'>
2024-01-24 09:40:19.290 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2 MMPPT: mmppt_Length 0x41 <class 'int'>
2024-01-24 09:40:19.291 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2 MMPPT: mmppt_Units 0x0 <class 'int'>
2024-01-24 09:40:19.291 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2 is NOT Multiple MPPT
2024-01-24 09:40:19.291 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x3 0x0 0x0 0x0 0x6 0x2 0x3 0x9c 0xb9 0x0 0x43
2024-01-24 09:40:19.291 DEBUG (MainThread) [pymodbus.logging] Adding transaction 3
2024-01-24 09:40:19.352 DEBUG (MainThread) [pymodbus.logging] recv: 0x0 0x3 0x0 0x0 0x0 0x89 0x2 0x3 0x86 0x0 0x1 0x0 0x41 0x57 0x61 0x74 0x74 0x4e 0x6f 0x64 0x65 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x57 0x4e 0x44 0x2d 0x33 0x59 0x2d 0x34 0x30 0x30 0x2d 0x4d 0x42 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x45 0x78 0x70 0x6f 0x72 0x74 0x2b 0x49 0x6d 0x70 0x6f 0x72 0x74 0x0 0x0 0x0 0x33 0x31 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x35 0x30 0x39 0x31 0x39 0x39 0x36 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3 old_data:  addr=None
2024-01-24 09:40:19.352 DEBUG (MainThread) [pymodbus.logging] Processing: 0x0 0x3 0x0 0x0 0x0 0x89 0x2 0x3 0x86 0x0 0x1 0x0 0x41 0x57 0x61 0x74 0x74 0x4e 0x6f 0x64 0x65 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x57 0x4e 0x44 0x2d 0x33 0x59 0x2d 0x34 0x30 0x30 0x2d 0x4d 0x42 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x45 0x78 0x70 0x6f 0x72 0x74 0x2b 0x49 0x6d 0x70 0x6f 0x72 0x74 0x0 0x0 0x0 0x33 0x31 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x35 0x30 0x39 0x31 0x39 0x39 0x36 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3
2024-01-24 09:40:19.352 DEBUG (MainThread) [pymodbus.logging] Factory Response[ReadHoldingRegistersResponse': 3]
2024-01-24 09:40:19.352 DEBUG (MainThread) [pymodbus.logging] Getting transaction 3
2024-01-24 09:40:19.352 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] Registers received requested : 67 67
2024-01-24 09:40:19.353 DEBUG (MainThread) [pymodbus.logging] [1, 65, 22369, 29812, 20079, 25701, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 22350, 17453, 13145, 11572, 12336, 11597, 16896, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17784, 28783, 29300, 11081, 28016, 28530, 29696, 0, 13105, 0, 0, 0, 0, 0, 0, 0, 13616, 14641, 14649, 13824, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3]
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_SunSpec_DID 0x1 <class 'int'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_SunSpec_Length 0x41 <class 'int'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_Manufacturer WattNode <class 'str'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_Model WND-3Y-400-MB <class 'str'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_Option Export+Import <class 'str'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_Version 31 <class 'str'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_SerialNumber 5091996 <class 'str'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M1: C_Device_address 0x3 <class 'int'>
2024-01-24 09:40:19.353 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] Found I2M1
2024-01-24 09:40:19.354 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x4 0x0 0x0 0x0 0x6 0x2 0x3 0x9d 0x67 0x0 0x43
2024-01-24 09:40:19.355 DEBUG (MainThread) [pymodbus.logging] Adding transaction 4
2024-01-24 09:40:19.600 DEBUG (MainThread) [pymodbus.logging] recv: 0x0 0x4 0x0 0x0 0x0 0x89 0x2 0x3 0x86 0x2 0xbd 0x0 0x99 0x0 0x0 0x0 0x1 0xff 0xff 0x0 0x1 0x0 0x0 0x0 0x0 0xff 0xff 0xff 0xff 0x0 0x21 0x0 0x22 0x0 0x7 0x0 0x61 0x80 0x0 0x9 0x61 0x4 0xb0 0x0 0x0 0xc3 0x81 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0xff 0xff 0xff 0xff 0xff 0xff 0x9 0x61 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x80 0x0 0x80 0x0 0x80 0x0 old_data:  addr=None
2024-01-24 09:40:19.600 DEBUG (MainThread) [pymodbus.logging] Processing: 0x0 0x4 0x0 0x0 0x0 0x89 0x2 0x3 0x86 0x2 0xbd 0x0 0x99 0x0 0x0 0x0 0x1 0xff 0xff 0x0 0x1 0x0 0x0 0x0 0x0 0xff 0xff 0xff 0xff 0x0 0x21 0x0 0x22 0x0 0x7 0x0 0x61 0x80 0x0 0x9 0x61 0x4 0xb0 0x0 0x0 0xc3 0x81 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0x80 0x0 0xff 0xff 0xff 0xff 0xff 0xff 0x9 0x61 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x80 0x0 0x80 0x0 0x80 0x0
2024-01-24 09:40:19.600 DEBUG (MainThread) [pymodbus.logging] Factory Response[ReadHoldingRegistersResponse': 3]
2024-01-24 09:40:19.601 DEBUG (MainThread) [pymodbus.logging] Getting transaction 4
2024-01-24 09:40:19.601 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] Registers received requested : 67 67
2024-01-24 09:40:19.601 DEBUG (MainThread) [pymodbus.logging] [701, 153, 0, 1, 65535, 1, 0, 0, 65535, 65535, 33, 34, 7, 97, 32768, 2401, 1200, 0, 50049, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 65535, 65535, 65535, 2401, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 32768, 32768, 32768]
2024-01-24 09:40:19.601 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_SunSpec_DID 0x2bd <class 'int'>
2024-01-24 09:40:19.601 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_SunSpec_Length 0x99 <class 'int'>
2024-01-24 09:40:19.601 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_Manufacturer !"a   a <class 'str'>
2024-01-24 09:40:19.601 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_Model Á <class 'str'>
2024-01-24 09:40:19.602 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_Option  <class 'str'>
2024-01-24 09:40:19.602 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_Version   a <class 'str'>
2024-01-24 09:40:19.602 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_SerialNumber  <class 'str'>
2024-01-24 09:40:19.602 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: C_Device_address 0x8000 <class 'int'>
2024-01-24 09:40:19.602 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M2: Meter 2 ident incorrect or not installed.
2024-01-24 09:40:19.602 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x5 0x0 0x0 0x0 0x6 0x2 0x3 0x9e 0x15 0x0 0x43
2024-01-24 09:40:19.603 DEBUG (MainThread) [pymodbus.logging] Adding transaction 5
2024-01-24 09:40:19.639 DEBUG (MainThread) [pymodbus.logging] recv: 0x0 0x5 0x0 0x0 0x0 0x89 0x2 0x3 0x86 0xff 0xff 0x0 0x0 0x0 0x1 0x0 0x2 0xff 0xff 0xf7 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x0 0x0 0xff 0xfe 0x0 0x0 0x0 0x0 0xff 0xff 0x0 0x0 0x0 0x0 0x2 0xbf 0x0 0x11 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xfe 0xff 0xfd 0x2 0xc0 0x0 0x41 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff old_data:  addr=None
2024-01-24 09:40:19.639 DEBUG (MainThread) [pymodbus.logging] Processing: 0x0 0x5 0x0 0x0 0x0 0x89 0x2 0x3 0x86 0xff 0xff 0x0 0x0 0x0 0x1 0x0 0x2 0xff 0xff 0xf7 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0x0 0x0 0xff 0xfe 0x0 0x0 0x0 0x0 0xff 0xff 0x0 0x0 0x0 0x0 0x2 0xbf 0x0 0x11 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xfe 0xff 0xfd 0x2 0xc0 0x0 0x41 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff
2024-01-24 09:40:19.639 DEBUG (MainThread) [pymodbus.logging] Factory Response[ReadHoldingRegistersResponse': 3]
2024-01-24 09:40:19.639 DEBUG (MainThread) [pymodbus.logging] Getting transaction 5
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] Registers received requested : 67 67
2024-01-24 09:40:19.640 DEBUG (MainThread) [pymodbus.logging] [65535, 0, 1, 2, 65535, 63487, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 0, 65534, 0, 0, 65535, 0, 0, 703, 17, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65534, 65533, 704, 65, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535]
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_SunSpec_DID 0xffff <class 'int'>
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_SunSpec_Length 0x0 <class 'int'>
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_Manufacturer  <class 'str'>
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_Model  <class 'str'>
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_Option  <class 'str'>
2024-01-24 09:40:19.640 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_Version  <class 'str'>
2024-01-24 09:40:19.641 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_SerialNumber A <class 'str'>
2024-01-24 09:40:19.641 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: C_Device_address 0xffff <class 'int'>
2024-01-24 09:40:19.641 DEBUG (MainThread) [custom_components.solaredge_modbus_multi.hub] I2M3: Meter 3 ident incorrect or not installed.
2024-01-24 09:40:19.641 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x6 0x0 0x0 0x0 0x6 0x3 0x3 0x9c 0x40 0x0 0x45
2024-01-24 09:40:19.641 DEBUG (MainThread) [pymodbus.logging] Adding transaction 6
2024-01-24 09:40:22.642 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x6 0x0 0x0 0x0 0x6 0x3 0x3 0x9c 0x40 0x0 0x45
2024-01-24 09:40:22.643 DEBUG (MainThread) [pymodbus.logging] Adding transaction 6
2024-01-24 09:40:25.644 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x6 0x0 0x0 0x0 0x6 0x3 0x3 0x9c 0x40 0x0 0x45
2024-01-24 09:40:25.645 DEBUG (MainThread) [pymodbus.logging] Adding transaction 6
2024-01-24 09:40:28.647 DEBUG (MainThread) [pymodbus.logging] send: 0x0 0x6 0x0 0x0 0x0 0x6 0x3 0x3 0x9c 0x40 0x0 0x45
2024-01-24 09:40:28.648 DEBUG (MainThread) [pymodbus.logging] Adding transaction 6
2024-01-24 09:40:31.649 DEBUG (MainThread) [pymodbus.logging] Connection lost comm due to Server not responding
2024-01-24 09:40:31.650 DEBUG (MainThread) [pymodbus.logging] Getting transaction 6
2024-01-24 09:40:31.650 DEBUG (MainThread) [custom_components.solaredge_modbus_multi] Finished fetching SolarEdge Coordinator data in 12.446 seconds (success: False)

I can see it interrogates the primary inverter and then tries to talk to Device ID 3 but gets no response. This reminds me of the previous issues relating to non-consecutive modbus device IDs...

Sadly, everytime SEDG muck around with my system they force these IDs back to defaults which may not be non-consecutive. (see #368).

It might be worth changing the hub failure error to suggest checking the DeviceID when it gets no response from an open connection?

Also, I'm wondering if it would be possible to auto scan the device IDs, given that sending the wrong ID results in an immediate empty response?

thargy commented 9 months ago

Great news! It turns out that they set the Device ID for both the master and slave inverters to '2'!!! I've queried why they did that - I hope it was just engineers messing around with things they didn't understand. Setting the master back to 1 has allowed me to re-add the hub.

I had previously renamed all the second solaredge_b1_...._2 entities, for the first battery on the second inverter to solaredge_b2_.... I wonder if it might be better to default to something like, solaredge_i1_b1_... and solaredge_i2_b1_..., that is specify the inverter and battery numbers in the entity name for clarity/disambiguation?

In the meantime, I've changed all my references to use the default naming so it isn't so difficult to remove and re-add a hub. This has been a very informative issue, as it's forced me to finally dig into the Modbus specs and understand them better. When I finish all my renovations and get my head above water, I'll try to contribute more time to helping with this fantastic project.

Thank you for your help.

WillCodeForCats commented 9 months ago

I did try naming like that, but reverted it for reasons I can't remember now. It might be in the history.

The repair isn't raised during setup, but its easy to change that.

WillCodeForCats commented 9 months ago

Scanning all possible modbus addresses isn't really practical. With minimum timeouts would take at least 12 minutes, and nobody is going to wait that long for HA to finish startup or add an entry and not think something is broken (if HA would even allow that before forcing a timeout).

Modbus protocol doesn't provide a way to detect duplicate ID. You can't assume that no response means duplicate ID, it can just mean no response too. I monitor my inverters and occasionally they will do the same non-response thing for a cycle.

WillCodeForCats commented 9 months ago

For what it's worth when my system was installed the installers left unit ID settings set to '1', so it's possible that in general it's not well understood why that setting would need to be unique per inverter. Duplicate unit ID does not appear to affect Solaredge's proprietary leader/follower protocol or their monitoring platform.

thargy commented 9 months ago

Scanning all possible modbus addresses isn't really practical. With minimum timeouts would take at least 12 minutes, and nobody is going to wait that long for HA to finish startup or add an entry and not think something is broken (if HA would even allow that before forcing a timeout).

It is true there are 247 possible device IDs (1->247) and at 3s timeout that would be just over 12 minutes in the worst case. However, SolarEdge runs on RS485 which I believe has a limit of 32 connected devices - though there's no guarantee that those IDs are consecutive, they are almost always clumped in the first 32 id spaces. Further, in my testing, when I pinged a missing device ID, I got an instant zero-length response - sensing this would allow immediate moving on to the next ID in a 'Fast Scan'. A timeout would usually indicate a comms failure and the scan could optionally abort immediately.

As it is unlikely that most devices would have an ID set above > 32, and, most of the time, and unassigned IP will immediately respond with an empty packet we can do an initial fast scan extremely quickly for the vast majority of scenarios. If the scan reports progress as it goes, the user can click stop when all the devices are found. Alternatively, they can enter the target device count before commencing a scan.

It's also not a requirement that people use the 'fast scan' if we provide a UI for manually entering Device IDs (which the fast scan populates automatically), they can have a Scan button to optionally perform a scan. We can also have a slower scan that rigorously checks from 1->247 with full timeouts if necessary, and that would take 12mins.

The point is that such a function would probably work very quickly for the vast majority of installs. SolarEdge has told me that the vast majority of sites are one inverter and one battery max, and the Device ID is likely to be < 4. Scanning automatically for device IDs and supporting non-consecutive IDs will make it much easier for less technical users.

Anyway, this is a suggestion, and I may be able to have a crack at implementing and testing it in the future if you're interested. If you think it's worth it I can write up a clean feature request for garnering interest?

Modbus protocol doesn't provide a way to detect duplicate ID. You can't assume that no response means duplicate ID, it can just mean no response too. I monitor my inverters and occasionally they will do the same non-response thing for a cycle.

Yes, in fact the primary inverter was responding fine on the ID, and the follower was hidden, there was no way to see there was actually a duplicate ID. A duplicate ID did not cause a non-response; the empty response was on ID '1', which was empty (as it had been moved to collide on '2'). As you can see from the logs, the primary inverter is the one responding on '2' whilst the duplication was in effect.

A scan whilst a duplicate ID is set would result in only the master inverter being found, and the follower being unavailable - in such a scenario the UI could suggest using the SetApp to check that there are different Device IDs and provide a link to the wiki.

It is my suspicion that distinguishing between an empty response and a lack of response will prove very fruitful in detecting an empty device ID. I suspect that when you fail to get a response on occasion on a valid ID, it is a non-response not an empty response, but that's harder to confirm as the current pymodbus doesn't log an empty response and waits for the timeout instead. A brief scan of pymodbus docs seems to indicate it can detect an empty response though, but I would need to investigate further.

WillCodeForCats commented 9 months ago

There are commercial size installations using this integration, so we can't make any assumptions about small sites.

WillCodeForCats commented 9 months ago

It is my suspicion that distinguishing between an empty response and a lack of response will prove very fruitful in detecting an empty device ID.

My inverters don't have that behavior, so that's probably dependent on the firmware version if yours do that and mine don't.

thargy commented 9 months ago

There are commercial-size installations using this integration, so we can't make any assumptions about small sites.

Agreed, but I'm not. As stated above, the 32-device limit is the same on large installations, so they must use separate mod busses to exceed that. Even if that limit isn't there, the Modbus spec uses one byte and only supports 247 values. My proposal has the following non-breaking mitigations for your scenario:

  1. Using the scan is optional, the user can add IDs manually (as now, except with the ability to add multiple IDs rather than just a 'starting' ID).
  2. I believe we can detect an empty ID very quickly (ms) not (s), however, even if you require a 3s timeout each time it only takes up to 12 minutes maximum, even on very large installations, it will usually find all devices before that, as it will only take 12 minutes if the last device is on Device ID 247 and we require 3s to detect an empty ID.

As such the proposal only adds a convenience feature, it doesn't create a breaking change. If users on large sites don't want to perform a scan, or wait up to 12 minutes, they are free to not do so. I suspect for the vast majority of users a scan will be quicker, more robust and more convenient. However, this is just a proposal for discussion.

My inverters don't have that behavior, so that's probably dependent on the firmware version if yours do that and mine don't.

Just to confirm, the pymodbus logs do not show any empty response. I only saw that by using Packet Sender and manually sending the binary data to the TCP port. I could not see a near instant empty response come back, whereas pymodbus reported nothing and waited for the full 3s timeout. I need to check the pymodbus code to see how it handles a truly zero length packet to see if we can detect it. Have you tried hitting the TCP Modbus directly rather than through pymodbus?

WillCodeForCats commented 9 months ago

I've already been working on stuff in regards to invalid IDs from #460 and the wish list has non-consecutive IDs on it, but I've not uploaded anything yet since it's not a high priority.

Modbus at its heart is broadcast-response and I have seen data from inverters where their "normal" is 1-2 seconds to respond, so 3 seconds is necessary (already on the edge of being too short). I've also seen inverters with an escalating response delay. I can't tell you the exact issue/pr but if you look through all of the closed ones it's in there somewhere. There's not much difference between Modbus TCP and RTU: you still broadcast and either you get a response or you don't. I also calculate timeouts based on number of requests and device types based on past observations, which you can see the total calculated timeout if you turn on debugging.

There are four possibilities when sending a modbus request:

So for the case of auto detecting IDs for setup, scanning the first 32 addresses with a 3 second delay and 3 retry attempts would be up to about 5 minutes. In reality it would probably be faster but it's not guaranteed.

I work with modbus outside of this (power meters, generators, transfer switches, pumps, air handlers, etc.), but I use libmodbus in C. I have modbus simulators that I've built up over time to work with this integration especially for devices I don't have like batteries. Other connection methods are academic at best, since HA is using Python and the Pymodbus library. About the only thing external tools are useful for in my experience are scanning registers looking for undocumented stuff. Even my simulators aren't ideal because they don't have the weird problems SE stuff can randomly have.

It's also on my wish list to support a serial option, so anything that's done ideally should consider the possibility for modbus RTU (the original form). It's at the bottom of the priority list though.

WillCodeForCats commented 9 months ago

I should add that I don't really care if some auto-detect takes 5 minutes to run, but I don't know how HA reacts to long running setups and end users generally want things to be instant or they complain.

WillCodeForCats commented 9 months ago

PR #518 will raise the repair issue if setup fails, previously it would just raise on connection issues post-setup.

thargy commented 9 months ago

Thanks for the great detail! Looking through your 4 responses you don’t list an empty response though - have you not seen an empty packet in your testing. Like I said, my system gave back a near instant empty packet for unassigned IDs, which is not the same as no response.

In terms of malformed responses it always a choice whether to retry (until a count or timeout) or log and move on. I can see the argument for the former - there being a bad response is a hint there is at least a device at that id so worth a retry.

I appreciate that an empty response is a TCP thing though, and wouldn’t work with RTU.

The 5-12 minute thing may cause a HA setup fail, that would need investigating. If they do timeout for setup there’s a strong case for a keep-alive feature request, but I acknowledge your concern. I have seen other setups that warn that it can take several minutes though, so I suspect it’s doable. It’s easy to find out with a nice long sleep anyway!

As for user experience that’s pretty resolvable with good UI. I think scan functionality should be an opt in, rather than requirement, e.g. a button you press when setting a list of device IDs that warns before starting. If you know your IDs you can just enter them manually. It’s highly likely most scans would complete fairly quickly after scanning 2-5 IDs, and if the user is warned that a scan may take up to 12 minutes they’ll accept that. They can always be passed to a wiki explaining why it takes so long.

I saw the PR and think that’s a great addition.

Ultimately this is a great tool, and has allowed me to do some really cool smart home stuff, so thanks for all the hard work and effort. I will obviously update you if I manage to get the register number out of SEDG. I currently have some really good access so it is possible. I’ve made a strong argument for getting it, though they’re understandably going to be reluctant given the beta nature of the mode.

WillCodeForCats commented 9 months ago

A modbus frame always has its headers, id, function code, and data bytes. TCP only differs from RTU in that modbus TCP doesn't include a checksum after the data bytes because it's relying on the lower levels of TCP/IP to handle that aspect.

If something send an "empty" response that would just be an I/O error as far as I understand the modbus protocol, maybe some attempts to resend the request, ultimately timing out due to no response.

thargy commented 9 months ago

A modbus frame always has its headers, id, function code, and data bytes. TCP only differs from RTU in that modbus TCP doesn't include a checksum after the data bytes because it's relying on the lower levels of TCP/IP to handle that aspect.

Yes.

If something send an "empty" response that would just be an I/O error as far as I understand the modbus protocol, maybe some attempts to resend the request, ultimately timing out due to no response.

What I saw was an empty TCP packet, (basically just a 64 byte header) which is a totally valid TCP packet - but as you say not a valid Modbus frame. It is quite distinct from no response at all, but I believe is treated like no response by pymodbus - as there is no data to decode.

I didn't see it treated as an error by pymodbus, which is not surprising as it's not really part of the modbus design itself as the idea of zero length only becomes possible when the modbus protocol is embedded in a different transport protocol as with Modbus TCP. The distinction between no response and empty occurs at the TCP layer not the Modbus protocol layer. Hence I acknowledged it wouldn't help when using Modbus RTU.

Technically, it is possible to distinguish such an empty TCP packet from no response, but I'm not sure if pymodbus supports that natively or just swallows the empty packet entirely (quite possible). In my testing that empty packet comes back instantly and may be a reliable indicator that the device id is unassigned, as opposed to a communication error or unresponsive slave. However, that would need careful testing and investigation. I am encouraged that the receipt of a tcp packet, albeit empty, requires the modbus server/slave to process the request and respond - unlike when it's just busy. It is not unlikely that the empty response is a result of SEDG's implementation recognising there is no device at that ID, and so having nothing to include in a response payload. An empty response would be equivalent to 'we know nothing came back' as opposed to no response being 'we're still waiting for a response'.

WillCodeForCats commented 9 months ago

I'm looking at mine in wireshark querying a unit ID that doesn't exist (ID 3) and I don't see what you're seeing.

The inverter sends a TCP ACK acknowledging it received the TCP/IP packet, but there's no "empty packet".

Screenshot 2024-01-24 at 4 40 06 PM
thargy commented 9 months ago

The ACK is an empty packet. It's just a flag in the header and you can include response data.

If you hit a known device ID I believe you get the response data in the ACK - especially if it's the first message of the connection.

I will do some wireshark tests, as the tool I used wasn't as detailed. My suspicion is that you only get an empty ACK if it's the first message, which would mean opening a new connection for each query.

I'll whip up some prototype scan code using the raw TCP connection to see if it works and report back.

I don't want to waste your time, I've just been really busy with renovations and builders, but it's on me to proof the idea.

thargy commented 9 months ago

I had a spare hour before the builders arrived this morning and re-learnt python as I only use it to help the nephews with their coursework normally!

I created a quick scan routine here and set a timeout of 0.5s, it ran reliably on my network and found the two inverters without any issues. It didn't get a response from my meter though so I probably need a different query, which I'll look into. Obviously, a timeout of 3s would be slower.

Sure enough, the ACK is only detectable in the kernel, not the sockets library. However, I found Scapy, which looks like it will give me what I want for a prototype. I'm out of time now, but I'll look at it again as soon as I can, and try a Scapy-based implementation.

WillCodeForCats commented 9 months ago

If you hit a known device ID I believe you get the response data in the ACK - especially if it's the first message of the connection.

No, you do not. ACK is ACK. It's not empty it's just ACK.

https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Reliable_transmission

This is a request and response sequence:

Screenshot 2024-01-25 at 3 42 05 AM

Pymodbus in HA generates a request. Request in sequence 5298:

Screenshot 2024-01-25 at 3 44 43 AM

We sent request to inverter, inverter's TCP/IP stack sends ACK saying hey I got your message. ACK to 5298 in 5299:

Screenshot 2024-01-25 at 3 45 01 AM

Modbus stack in inverter generates a response. Response in sequence 5300:

Screenshot 2024-01-25 at 3 45 31 AM

Inverter sent a response us, so our TCP/IP stack sends ACK saying hey we got your message. ACK to 5300 in 5301:

Screenshot 2024-01-25 at 3 48 52 AM

ACKs are the difference between TCP and UDP. ACKs are why there's no CRC segment vs. RTU frames. The TCP/IP stack is doing that work that so modbus doesn't need to.

If you're looking for ACKs you're detecting if a working TCP/IP stack exists, but that has nothing to do with modbus.

thargy commented 9 months ago

I apologise if I'm upsetting you. I'm autistic and struggle with misunderstanding when communicating via the written word. I really hope I am not frustrating you, and I genuinely apologise if I am. It is not my intent. Feel free to close the issue if you feel its gone too far off topic.

What I said was:

The ACK is an empty packet. It's just a flag in the header and you can include response data.

See TCP delayed acknowledgement

For example, Linux implementations have a 'Delayed ACK' setting. This is usually set around 200ms, when a request is received, the kernel will not immediately respond with an ACK (which is just an empty TCP packet with the ACK flag set in the header, as I mentioned), but wait to see if a response is provided from the user application or higher layers (e.g. HTTP). In the vast majority of real-world cases, that response comes back in less than 200ms, so by waiting for it, it can combine the ACK + response data into a single Layer 2 packet. If the timeout is hit (or you are in 'Quick ACK' mode), it will send the empty packet with an ACK. This approach can effectively reduce network traffic by up to 50%. Windows also uses delayed ack (see this StackOverflow as an example).

All that stuff is academic and will prove irrelevant (see below) except to explain my reasoning. It's been 20-30 years since I last worked on lower-layer protocols like TCP. These days, it's very much a solved problem left to kernel authors, so I fully acknowledge my memory is a little rusty on these things.

When responses I got from my inverter (when using a valid Device ID) seemed to take <200ms, it reminded me of delayed ACK. The software I used made it look like they were in the same packet as the ACK, which is what I'd expect. Further, I quickly saw an empty ACK when the Device ID was unused.

Based on that, I theorised that we might get away with moving on in a scan when we receive the ACK.

I had an SSD fail recently, so I still needed to reinstall Wireshark. With everything else going on, I didn't have time. So, this morning, I took the time to install it and confirmed similar results to yours[^1] which shows me scanning device ids 1-4 with a 3s timeout. You'll note that ID 1 request (114) took 57.4ms to receive a response (121), but as you showed there was still an empty ACK (115) before that (after 1ms):

114 1.688987    192.168.86.247  192.168.89.177  TCP 66  59403 → 1502 [PSH, ACK] Seq=1 Ack=1 Win=2097920 Len=12
115 1.690004    192.168.89.177  192.168.86.247  TCP 60  1502 → 59403 [ACK] Seq=1 Ack=13 Win=29200 Len=0
121 1.746356    192.168.89.177  192.168.86.247  TCP 201 1502 → 59403 [PSH, ACK] Seq=1 Ack=13 Win=29200 Len=147

In contrast, Device ID 2 took 413ms, and Device ID 3 took 1.63s!

Keeping a single connection, sped things up, but the Devlice ID 3 didn't respond in 3s. This echoes my previous testing where Device ID 3 rarely responds quickly - however, that is not a problem as it is the meter, and you can find its ID by querying the master inverter anyway.

What this shows is:

My biggest logic error was - if I was happy to accept an empty ACK at 200ms as an acknowledgement that no device ID was found [^2] then that is not practically different from us applying a 200ms[^3] timeout to the request!

So the real question is, what is the shortest acceptable timeout to spot a potential inverter? In my testing, 0.5s has been pretty reliable - especially if I didn't care about responses from meters/non-inverters, which usually took much longer. If it wasn't ready to respond, I usually got a connection close/fail attempt, from which I could attempt a retry. Reusing the same connection tended to help the algorithm run quicker (as it didn't have to tear down and reinitiate connections each time).

The following algorithm would generally find inverters very quickly:

This takes advantage of 2 observations:

In my own testing such an approach found my inverters almost all the time, and within a few seconds.

You can see my script at https://github.com/thargy/modbus-scanner, that implements this chunking approach. If you want to try it for yourself. It's only a prototype (and I've not written python in ages), but it's fully functional and allows you to muck around with settings (use -h command line option). I think it proves the concept that a fast scan option (rather than requirement) could really help less technical users.

Anyway, sorry for the wall of text.

[^1]: see scans.zip which contains two wireshark .pcapng files, one with a new TCP connection each time, and one with a reused connection. [^2]: because the server had not received data to pack into the ACK in time, before its ACK timeout. [^3]: to be directly equivalent you would also have to account for the latency by adding it to the 200ms, as this timeout is on the client rather than the server, but it's not a big difference. [^4]: We can also approximate the chunk size from just the inverter count by using 1.5*inverters if you don't want to ask for the number of meters (which a homeowner might not know readily). If the homeowner doesn't know the inverter count, they'd have to scan the full range, in which case, setting a chunk size of 4 would be reasonable.

WillCodeForCats commented 9 months ago

It looks like you're using Python 3.12 specific things and my dev systems are Python 3.11 since that's what the integration requires, so I wasn't able to try it.

thargy commented 9 months ago

Sorry, just had to learn about venvs on windows, I think I've now tested properly on 3.11.7. I replaced itertools.batch which was added in 3.12 with a function. Hopefully that works for you.

WillCodeForCats commented 9 months ago

Running against my inverters total time was 14:19.04 for the full range and 1:45.51 for the first 32 unit IDs.

thargy commented 9 months ago

Running against my inverters total time was 14:19.04 for the full range and 1:45.51 for the first 32 unit IDs.

If you force it to do a comprehensive scan, it will take at least 3.5s per device ID (by default, you can change every timeout with the Command Line Options) - which is exactly what you're getting. Every time it gets a valid response in less than 3.5s it will shorten the total duration from the 865s maximum. As your time was 24.5s faster than the maximum, it must have seen some things.

Of course, doing a full scan is only necessary if you don't know how many inverters you have, as the idea is to find a set number of inverters before exiting; or, if the last inverter has been perversely set as 247. So, the command line supports specifying the number of inverters (according to the algorithm I proposed). Most users will only have up to 3 inverters (e.g. one per phase), and the routine will terminate in seconds. It's reasonable to propose that anyone with >3 inverters would probably be a more advanced user anyway, and more than able to discover their IDs manually.

I would propose having a comma-separated list of inverter IDs as the setup option and an adjacent button to scan for them. That button would ask for the number of inverters to find (and potentially whether to look for batteries), and have a warning that a full scan could take up to 15 minutes. Clicking the scan would show progress with a cancel option allowing them to cancel it wherever they got up. On completion/cancellation, it would populate the inverter IDs into the setup box.

Most users would click the scan button, select 1-3 inverters, and it would complete in <20s. Advanced users may enter the comma-separated list of inverter IDs (or take advantage of the simple device ID parser I made, which allows syntaxes such as "1,3, 4-7") and ignore the scan functionality entirely. Even in the worst case, which I really don't think would happen in real use, the scan would take <15minutes.

I hope you found this a useful contribution. Ultimately, such a system would just be a quality of life improvement for novice users.

The more useful change would be the ability to support non-contiguous device IDs, which would be a precursor requirement to implementing a scan routine anyway.

WillCodeForCats commented 9 months ago

Changing the field to allow a list of addresses is relatively easy. Making it user-proof and HA config entry schema conversion for upgrades is the part that takes time.

It would be considered a breaking change since changing the config schema prevents downgrades to a previous version, so it needs planning for the right time to introduce it.

thargy commented 9 months ago

It would be considered a breaking change since changing the config schema prevents downgrades to a previous version, so it needs planning for the right time to introduce it.

I'm not au fait with the schema or the upgrade/downgrade. In other systems, we'd handle this by adding a new configuration option and hiding the old one but keeping it in sync, so in the case of a downgrade, the old setting would still be there and be 'valid'.

For example, we add a new device_ids (as str) and hide the start_id (int), or whatever you currently call it. We then ensure that whenever the device_ids are set, we update the start_id to the first device ID in our new list. This would preserve backward compatibility because the existing code does not support non-consecutive IDs anyway, so a device_ids value of 2-4 would store 2 in the start ID and continue to work on a downgrade. This presupposes that the HA setup schema supports ignoring unknown settings on a downgrade.

When upgrading, we take the existing inverter count and start ID and set the new device_ids field to a range based on those values. For example, for 3 inverters and a start ID of 4, we'd set device_ids to 4-6.

WillCodeForCats commented 9 months ago

The most common is 1,3 where installers don't realize you can overlap a meter on ID 2.

thargy commented 9 months ago

The most common is 1,3, where installers don't realize you can overlap a meter on ID 2.

That was my configuration until I persuaded my installer to stick the meter on 3. The above algorithm handles that scenario perfectly. Two inverters chunk the device ID in chunk sizes of 3 (1.5x2), so it will fast scan 1-3 and then slow scan 1-3. Most of the time, it will find Devices 1 and 3 in less than 1.5 seconds (the fast scan) or in less than 10.5 seconds (worst case, Device ID 3 needs a slow scan).

WillCodeForCats commented 9 months ago

How does that help making 1,3 a schema downgrade?

thargy commented 9 months ago

How does that help making 1,3 a schema downgrade?

Because you don't support non-consecutive IDs in the current version anyway.

Anyone successfully using the addon at this current version has to have set up consecutive IDs already - otherwise it won't work. A downgrade is only of benefit to such users. Those users can only set, for example '2 inverters start ID 2', which would mean IDs '2-3' if they upgrade their setup would become '2-3' and still work, and if they downgrade it will revert to start ID '2' and still work.

If someone upgrades and then changes their inverter IDs then they cannot downgrade without changing those IDs back. However, most users cannot change their IDs (unless they have SetApp and understand how to use it). If they can change their IDs, they can return them to the original settings when they downgrade, to restore their original system.

i.e. a downgrade only needs to restore the functionality available in that version.

Does that make sense?

WillCodeForCats commented 9 months ago

If someone upgrades and then changes their inverter IDs then they cannot downgrade without changing those IDs back.

That is my entire point saying this is a breaking change.

thargy commented 9 months ago

Oh, OK, this looks like a semantic thing. That's different from what we'd usually define as a breaking change. A breaking change could potentially break an existing working system when upgrading. This doesn't, as it can upgrade all existing working systems without them stopping working.

More accurately, if you change an existing module, it should be able to be placed in the wider system without breaking any existing functionality. Again, this doesn't meet that criteria.

Some companies would define a breaking change to also include changes that, once applied, cannot be reversed. That view is less common but is out there. Again this wouldn't meet that loose criteria either, as we have a fully viable downgrade path for systems that can work in the downgraded version. When downgrading to today's version, you can't expect a system that doesn't work today to work.

This is purely adding a new feature (IMHO), as you can upgrade from working to working and downgrade from working & not working back to working.

If you upgrade a working system to take advantage of non-consecutive IDs, you must change your setup to use that. However, non-consecutive IDs are only supported from the upgraded version, so downgrading would never work. That's not a breaking change. You can't use new functionality in an old version of software, that's never been possible.

I apologise my explanations are always overly verbose, but I'm trying hard to be clear and to understand your objection.

thargy commented 9 months ago

If you're referring to 1,3 auto-mapping to 1 on downgrade, I refer you to

For example, we add a new device_ids (as str) and hide the start_id (int), or whatever you currently call it. We then ensure that whenever the device_ids are set, we update the start_id to the first device ID in our new list. ... This presupposes that the HA setup schema supports ignoring unknown settings on a downgrade.

i.e. this relies on the start ID remaining in the new version and being set to the lowest ID (1) whenever, you change the new range field (1,3). When you downgrade the 1 field is still there and of the right type, but the new field is ignored by that version of code.

WillCodeForCats commented 9 months ago

Someone with a 1,3 setup downgrading because there's some unforeseen issue that takes time to figure out will complain about only 1 working. That's not the experience I want people to have with my integration. I don't want to see that issue report nor have to explain they need to go into setapp to change unit IDs to get something working again.

The ideas are fine but there are a other things I want to do first before creating a hard line like that. That's why #460 is still open. If I'm going to add validation for one ID then naturally that can extend to checks that would allow non-consecutive IDs. It will be in 2.5.0 but not the current 2.4.x series.

See this project board for the list of ideas:

https://github.com/WillCodeForCats/solaredge-modbus-multi/projects/2

thargy commented 9 months ago

Someone with a 1,3 setup downgrading because there's some unforeseen issue that takes time to figure out will complain about only 1 working. That's not the experience I want people to have with my integration. I don't want to see that issue report nor have to explain they need to go into setapp to change unit IDs to get something working again.

This is where I'm getting confused. Anyone with a 1,3 setup cannot be helped by downgrading, as their system cannot work on the downgraded version, and the downgraded version cannot work with a 1,3 setup (as it doesn't support non-consecutive IDs).

As the system works now, anyone with a 1,3 setup will raise the issues you rightly don't want to deal with (as I did last July, as I was one of them). If you add the upgraded functionality, they will stop raising such issues as it will now work.

You would not advise them to downgrade to an earlier version that doesn't support their system, and if they did it off their own back the 'make sure you're using the latest version' response is the correct one.

If you advise anyone to downgrade beyond the "non-consecutive ID support" version (e.g. because of an unrelated bug in the latest version), it would be important to emphasise that versions before that don't support "non-consecutive IDs". But that's the same for any new piece of functionality. The good news it will be obvious to users what their pre-downgrade IDs are - as they'll now be populated fully in the settings. You can even add a warning in the setup that non-consecutive IDs are not supported in version > 2.5.0 (or whatever), whenever non-consecutive IDs are detected...

The ideas are fine but there are a other things I want to do first before creating a hard line like that. That's why https://github.com/WillCodeForCats/solaredge-modbus-multi/issues/460 is still open. If I'm going to add validation for one ID then naturally that can extend to checks that would allow non-consecutive IDs. It will be in 2.5.0 but not the current 2.4.x series.

Prioritisation is entirely your call, and I'm not pushing for it to be added for me. SEDG has committed that they won't change my IDs without telling me again (which might prove difficult as they don't log that change anywhere accessible, even for them! At least they've put a note on my account), so I have consecutive IDs now anyway. I'm waiting to hear back, but they were meeting on Friday with their Tier 1 engineers to discuss how support handles IDs as some wacky advice was being given out and strange actions were taken, which demonstrated that their Tier 3's might not fully understand the Modbus Device ID stuff.

Sadly they haven't yet bitten on giving me the holding register number for the new mode - they keep ignoring that question. Not that surprised tbh.

As it's a relatively big new feature, I agree that leaving it for a minor version bump is eminently sensible.

Thank you for being patient with me, and again, I apologise for any misunderstandings.