fabian-lauer / dbus-shelly-3em-smartmeter

Integrate Shelly 3EM smart meter into Victron Energies Venus OS
125 stars 55 forks source link

Service stops if shelly is not reachable #44

Open jgracio opened 1 year ago

jgracio commented 1 year ago

Although there is not a persistent problem with the network nor with shelly 3em, the service stops after some minutes with the following message on current.log "CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0". After I restart the service manually and it starts working again for more some minutes/hour.

There is a way to make this access more resilient not being necessary o keep restarting it?

Thanks

fabian-lauer commented 1 year ago

Hi @jgracio ,

thanks for your feedback - I think there are more possibilities to gain more resilience, but can you give more details about the situation?

Is the device not reachable via network (WIFI lost, etc). I am asking as I would like to reproduce the situation and work on solutions.

Because based on code the error should result in one miss measurement and a retry after ~500ms. Can you confirm or report anything different?

jgracio commented 1 year ago

Thanks @fabian-lauer for your prompt reply

I do not have any further information about connection failures, but I can say I'm using a wifi connection on Shelly and also on Cerbo GX. As I wrote before my Unify system is not detecting any grid failure.

Where is set the retry after ~500ms? I notice on line 134 that the access to Shelly has "meter_r = requests.get(url = URL, timeout=5)". Shall I increase the timeout here?

Thanks again

fabian-lauer commented 1 year ago

The 500ms is the time where the main "loop" for getting data is running. The "timeout=5" in the command is just if the http response is not received within that time.....I mean you can extend this, but if no response is received within the timeframe, you will just wait for the next "loop" run.

Hmmm... difficult

jgracio commented 1 year ago

Thanks, @fabian-lauer

I did increase the timeout of the HTTP response from 5 to 10. It seems the failure are decreasing but still failing, but I notice a pattern with daily failures @ 00:01: _

"2023-04-17 00:01:03,946 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-19 00:01:09,829 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-20 00:01:10,558 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0"

_

Do you have a view of why the failure is happening with this regularity?

fabian-lauer commented 1 year ago

Hi @jgracio - not sure what the reason is really, but looks like a problem with network - maybe regular router restart configures or something else?

But will check what happens when Shelly Connection goes away like while rebooting

fabian-lauer commented 1 year ago

Hmmm...just realizing that my log file looks the same:

2023-04-16 00:01:15,830 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-16 00:01:20,865 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-16 00:01:25,905 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 00:01:05,303 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 00:01:10,357 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 00:01:15,419 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 00:01:20,455 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 00:01:25,494 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:00:45,609 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:00:50,670 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:00:55,706 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:00,740 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:05,775 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:10,818 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:15,855 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:15,885 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:16,386 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:16,883 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:20,515 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:23,639 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:26,758 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:29,878 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:32,997 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-17 01:01:36,116 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-18 00:01:05,392 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-18 00:01:10,437 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-18 00:01:15,472 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-18 00:01:20,507 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-18 00:01:25,552 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-18 11:36:41,599 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-19 00:01:06,318 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-19 00:01:11,386 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-19 00:01:16,424 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-19 00:01:21,463 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-19 00:01:26,499 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-20 00:01:04,982 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-20 00:01:10,40 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-20 00:01:15,72 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-20 00:01:20,106 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0 2023-04-20 00:01:25,164 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0

But I have no problem with the script - never do a restart either of Cebro nor script

jgracio commented 1 year ago

Hello @fabian-lauer

After the exception, I include the following line to have more information about it:

logging.critical("ValueError: %s, ConnectionError: %s , Timeout: %s, ConnectionError2: %s" % (ValueError, requests.exceptions.ConnectionError, requests.exceptions.Timeout, ConnectionError))

My knowledge of Python is null, so the result was not worth as I got

2023-04-21 00:01:09,810 root CRITICAL ValueError: <class 'ValueError'>, ConnectionError: <class 'requests.exceptions.ConnectionError'> , Timeout: <class 'requests.exceptions.Timeout'>, ConnectionError2: <class 'ConnectionError'> 2023-04-21 00:01:09,811 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0

Can you please help me better understand why this error (or errors) happens?

fabian-lauer commented 1 year ago

Hi @jgracio ,

that style of logging is not correct - please change it like this:

except (ValueError, requests.exceptions.ConnectionError, requests.exceptions.Timeout, ConnectionError) as e: logging.critical('Error getting data from Shelly - check network or Shelly status. Setting power values to 0. Details: %s', e, exc_info=e)
self._dbusservice['/Ac/L1/Power'] = 0
self._dbusservice['/Ac/L2/Power'] = 0
self._dbusservice['/Ac/L3/Power'] = 0 self._dbusservice['/Ac/Power'] = 0 self._dbusservice['/UpdateIndex'] = (self._dbusservice['/UpdateIndex'] + 1 ) % 256

After that and after making Shelly unavailable - error msg look like this:

2023-04-21 07:17:25,239 urllib3.connectionpool DEBUG Starting new HTTP connection (1): 192.168.2.13:80 2023-04-21 07:17:25,314 urllib3.connectionpool DEBUG http://192.168.2.13:80 "GET /status HTTP/1.1" 401 None 2023-04-21 07:17:25,341 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0. Details: No response from Shelly 3EM - http://flauer:Swatch3$@192.168.2.13/status Traceback (most recent call last): File "/data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py", line 160, in _update meter_data = self._getShellyData() File "/data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py", line 138, in _getShellyData raise ConnectionError("No response from Shelly 3EM - %s" % (URL)) ConnectionError: No response from Shelly 3EM - http://flauer:XXXXX@192.168.2.13/status

But when checking the details - you will see that the error is thrown by own code based on "if not meter_r:". So the script is unable to get response result....based on what ever.

jgracio commented 1 year ago

@fabian-lauer , I used your script and the error log stopped with the following

2023-04-21 22:07:26,974 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0. Details: HTTPConnectionPool(host='192.168.1.128', port=80): Max retries exceeded with url: /status (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0xb5a517d8>, 'Connection to 192.168.1.128 timed out. (connect timeout=10)')) Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/urllib3/connection.py", line 159, in _new_conn conn = connection.create_connection( File "/usr/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection raise err File "/usr/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection sock.connect(sa) socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 665, in urlopen httplib_response = self._make_request( File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib/python3.8/http/client.py", line 1256, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output self.send(msg) File "/usr/lib/python3.8/http/client.py", line 951, in send self.connect() File "/usr/lib/python3.8/site-packages/urllib3/connection.py", line 187, in connect conn = self._new_conn() File "/usr/lib/python3.8/site-packages/urllib3/connection.py", line 164, in _new_conn raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0xb5a517d8>, 'Connection to 192.168.1.128 timed out. (connect timeout=10)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 719, in urlopen retries = retries.increment( File "/usr/lib/python3.8/site-packages/urllib3/util/retry.py", line 436, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.1.128', port=80): Max retries exceeded with url: /status (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0xb5a517d8>, 'Connection to 192.168.1.128 timed out. (connect timeout=10)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py", line 150, in _update meter_data = self._getShellyData() File "/data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py", line 124, in _getShellyData meter_r = requests.get(url = URL, timeout=10) File "/usr/lib/python3.8/site-packages/requests/api.py", line 76, in get return request('get', url, params=params, kwargs) File "/usr/lib/python3.8/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "/usr/lib/python3.8/site-packages/requests/sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "/usr/lib/python3.8/site-packages/requests/sessions.py", line 643, in send r = adapter.send(request, kwargs) File "/usr/lib/python3.8/site-packages/requests/adapters.py", line 504, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='192.168.1.128', port=80): Max retries exceeded with url: /status (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0xb5a517d8>, 'Connection to 192.168.1.128 timed out. (connect timeout=10)'))

Thats a long message and very different from the one you got.

Any clue what's wrong?

fabian-lauer commented 1 year ago

Puhh...still hard to rate for me - looks like a "network" issue - but can also be a problem with the endpoint on Shelly device.

Let's focus more on the fact that the device is not reachable for a few seconds (why - no idea at the moment) and focus on the fact that it should recover automatically - this is still not happening right?

jgracio commented 1 year ago

Agreed. The service keeps failing and it restarts for more some hours after being restarted. It would be great if the service would not stop.

fabian-lauer commented 1 year ago

Hi @jgracio,

no idea so far why the script is crashing - will try to reproduce in a few minutes by changing the IP address of my Shelly during normal script run.....

jgracio commented 1 year ago

Thanks @fabian-lauer

fabian-lauer commented 1 year ago

Hi @jgracio,

this doing fell into oblivion, but I finaly did the test. BUT without any new outcome - for me it is fine - when chaning the IP the script gives in logging mode "DEBUG" the following error msg:

2023-05-26 11:43:10,423 root CRITICAL Error getting data from Shelly - check network or Shelly status. Setting power values to 0. Details: HTTPConnectionPool(host='192.168.2.13', port=80): Max retries exceeded with url: /status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xb5abcfd0>: Failed to establish a new connection: [Errno 113] No route to host')) Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/urllib3/connection.py", line 159, in _new_conn conn = connection.create_connection( File "/usr/lib/python3.8/site-packages/urllib3/util/connection.py", line 84, in create_connection raise err File "/usr/lib/python3.8/site-packages/urllib3/util/connection.py", line 74, in create_connection sock.connect(sa) OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 665, in urlopen httplib_response = self._make_request( File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 387, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib/python3.8/http/client.py", line 1256, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output self.send(msg) File "/usr/lib/python3.8/http/client.py", line 951, in send self.connect() File "/usr/lib/python3.8/site-packages/urllib3/connection.py", line 187, in connect conn = self._new_conn() File "/usr/lib/python3.8/site-packages/urllib3/connection.py", line 171, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0xb5abcfd0>: Failed to establish a new connection: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 719, in urlopen retries = retries.increment( File "/usr/lib/python3.8/site-packages/urllib3/util/retry.py", line 436, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.2.13', port=80): Max retries exceeded with url: /status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xb5abcfd0>: Failed to establish a new connection: [Errno 113] No route to host'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py", line 160, in _update meter_data = self._getShellyData() File "/data/dbus-shelly-3em-smartmeter/dbus-shelly-3em-smartmeter.py", line 134, in _getShellyData meter_r = requests.get(url = URL, timeout=5) File "/usr/lib/python3.8/site-packages/requests/api.py", line 76, in get return request('get', url, params=params, kwargs) File "/usr/lib/python3.8/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "/usr/lib/python3.8/site-packages/requests/sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "/usr/lib/python3.8/site-packages/requests/sessions.py", line 643, in send r = adapter.send(request, kwargs) File "/usr/lib/python3.8/site-packages/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='192.168.2.13', port=80): Max retries exceeded with url: /status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xb5abcfd0>: Failed to establish a new connection: [Errno 113] No route to host')) 2023-05-26 11:43:10,458 urllib3.connectionpool DEBUG Starting new HTTP connection (1): 192.168.2.13:80

And when changing the IP back to the normal value it starts working again:

2023-05-26 11:43:13,582 urllib3.connectionpool DEBUG Starting new HTTP connection (1): 192.168.2.13:80 2023-05-26 11:43:16,17 urllib3.connectionpool DEBUG http://192.168.2.13:80 "GET /status HTTP/1.1" 200 1183 2023-05-26 11:43:16,85 root DEBUG House Consumption (/Ac/Power): -6464.66 2023-05-26 11:43:16,100 root DEBUG House Forward (/Ac/Energy/Forward): 0 2023-05-26 11:43:16,105 root DEBUG House Reverse (/Ac/Energy/Revers): 0.1522681833333333 2023-05-26 11:43:16,111 root DEBUG ---

So I have no good idea at the moment what may cause your problem. Could only imagine that the Pyhton libary handels an unreachable host differntly compared to a timeout. But I don't know how to reproduce a timeout :-(

svenbluege commented 11 months ago

I encounter the same issue: after a socket.timeout: timed out, the server can't fix itself.

To reproduce this, one could add a server with a simple "I sleep and answer after 60s" page to get the timeout. Maybe this is caused by a hick up in the EM3 or the Wifi. Only an idea... I did not do that. This only produces a Read timed out which looks recoverable (i switched between two hosts in config.ini for this test)

Since nobody has a clue on how to reproduce this until now, what do you think about simply exiting the program and let the service do a restart? We at least know that a manual restart of the server helps. That could be automated. I know, turning it off and on is not a desirable fix ;)