amitfin / retry

Home Assistant Integration with Retry Service
MIT License
104 stars 4 forks source link

Reliable Telegram Bot with Retry #24

Closed StefanoGiu closed 1 year ago

StefanoGiu commented 1 year ago

System Health details

I'm trying to make a reliable telegram bot using retry to send the message again in case of error.

I tried with: service: retry.call target: entity_id: telegram_bot.send_message data: service: telegram_bot.send_message title: >- --{{title}}-- message: >- {{message}}

But I got the error homeassistant.exceptions.InvalidStateError: telegram_bot.send_message is not available and the message is sent successfully 7 times each time, cause it doesn't recognize the success for it.

I also tried with:

But in this case, it doesn't retry it in case of error.

Any suggestion?

Checklist

Describe the issue

Described above

Reproduction steps

1. 2. 3. ...

Debug logs

homeassistant.exceptions.InvalidStateError: telegram_bot.send_message is not available

Diagnostics dump

No response

amitfin commented 1 year ago

Can you try removing?

target:
  entity_id: telegram_bot.send_message

telegram_bot.send_message is not an entity. It's a service.

StefanoGiu commented 1 year ago

Removing it solves the exeption in the log, but still the retry is not able to re-submit telegram messages that are in error.

I have bunch of those errors if I try to issue 100 message on telegram in 1 second (a test to verify if retry works), but actually retry is not triggering any new tentative for telegram bot...

2023-09-05 13:53:55.545 WARNING (SyncWorker_21) [telegram.vendor.ptb_urllib3.urllib3.connectionpool] Connection pool is full, discarding connection: api.telegram.org 2023-09-05 13:53:55.547 ERROR (SyncWorker_22) [homeassistant.components.telegram_bot] Error sending message: Timed out. Args: (XXXX, '+++ Test Notifier\nTest numero 96'), kwargs: {'parse_mode': 'Markdown', 'disable_web_page_preview': None, 'disable_notification': False, 'reply_to_message_id': None, 'reply_markup': None, 'timeout': None} 2023-09-05 13:53:55.553 ERROR (SyncWorker_21) [homeassistant.components.telegram_bot] Error sending message: Timed out. Args: (XXXX, '+++ Test Notifier\nTest numero 98'), kwargs: {'parse_mode': 'Markdown', 'disable_web_page_preview': None, 'disable_notification': False, 'reply_to_message_id': None, 'reply_markup': None, 'timeout': None} 2023-09-05 13:53:55.646 WARNING (SyncWorker_49) [telegram.vendor.ptb_urllib3.urllib3.connectionpool] Connection pool is full, discarding connection: api.telegram.org 2023-09-05 13:53:55.649 ERROR (SyncWorker_49) [homeassistant.components.telegram_bot] Error sending message: Timed out. Args: (XXXX, '+++ Test Notifier\nTest numero 99'), kwargs: {'parse_mode': 'Markdown', 'disable_web_page_preview': None, 'disable_notification': False, 'reply_to_message_id': None, 'reply_markup': None, 'timeout': None} 2023-09-05 13:53:55.677 WARNING (SyncWorker_57) [telegram.vendor.ptb_urllib3.urllib3.connectionpool] Connection pool is full, discarding connection: api.telegram.org 2023-09-05 13:53:55.680 ERROR (SyncWorker_57) [homeassistant.components.telegram_bot] Error sending message: Timed out. Args: (XXXX, '+++ Test Notifier\nTest numero 100'), kwargs: {'parse_mode': 'Markdown', 'disable_web_page_preview': None, 'disable_notification': False, 'reply_to_message_id': None, 'reply_markup': None, 'timeout': None}

StefanoGiu commented 1 year ago

How can I make telegram_bot reliable with retry?

amitfin commented 1 year ago

The problem is here (inside the telegram_bot integration code). The integration doesn't propagate the error (the exception), so from the caller perspective (retry.call) it was successful. A straightforward change is to add raise inside the except block:

        except TelegramError as exc:
            _LOGGER.error(
                "%s: %s. Args: %s, kwargs: %s", msg_error, exc, args_msg, kwargs_msg
            )
            raise

But it requires coordination with the component's owner...

StefanoGiu commented 1 year ago

I added the raise in the telegram_bot component, thank you :)

I got now the below exception and still retry doesn't seem to make his job. btw, adding a raise in the code fills the log with the entire stack trace... is there a way to avoid it?

2023-09-05 14:44:05.638 ERROR (MainThread) [homeassistant.core] Error executing service: <ServiceCall telegram_bot.send_message (c:01H9JNY0ED36KVWN5ZSXCQS48K): target=[808687121], title=Template<template=( Test Notifier) renders=1>, message=Template<template=(Test numero 100) renders=1>> Traceback (most recent call last): File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\connectionpool.py", line 402, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\connectionpool.py", line 398, in _make_request httplib_response = conn.getresponse() ^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\http\client.py", line 1374, in getresponse response.begin() File "xx\Python\Python311\Lib\http\client.py", line 318, in begin version, status, reason = self._read_status() ^^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\http\client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\socket.py", line 705, in readinto return self._sock.recv_into(b) ^^^^^^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\ssl.py", line 1278, in recv_into return self.read(nbytes, buffer) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\ssl.py", line 1134, in read return self._sslobj.read(len, buffer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "zz.homeassistant\deps\Python311\site-packages\telegram\utils\request.py", line 252, in _request_wrapper resp = self._con_pool.request(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\request.py", line 68, in request return self.request_encode_body(method, url, fields=fields, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\request.py", line 148, in request_encode_body return self.urlopen(method, url, extra_kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\poolmanager.py", line 244, in urlopen response = conn.urlopen(method, u.request_uri, *kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\connectionpool.py", line 665, in urlopen retries = retries.increment(method, url, error=e, _pool=self, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\util\retry.py", line 347, in increment raise six.reraise(type(error), error, _stacktrace) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\packages\six.py", line 686, in reraise raise value File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\connectionpool.py", line 614, in urlopen httplib_response = self._make_request(conn, method, url, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\connectionpool.py", line 404, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout, File "zz.homeassistant\deps\Python311\site-packages\telegram\vendor\ptb_urllib3\urllib3\connectionpool.py", line 321, in _raise_timeout raise exc_cls(args) telegram.vendor.ptb_urllib3.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.telegram.org', port=443): Read timed out. (read timeout=5.0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "xx\Python\Python311\Lib\site-packages\homeassistant\core.py", line 1990, in _run_service_call_catch_exceptions await coro_or_task File "xx\Python\Python311\Lib\site-packages\homeassistant\core.py", line 2011, in _execute_service return await target(service_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\site-packages\homeassistant\components\telegram_bot__init__.py", line 407, in async_send_telegram_message await hass.async_add_executor_job( File "xx\Python\Python311\Lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "xx\Python\Python311\Lib\site-packages\homeassistant\components\telegram_bot__init.py", line 657, in send_message self._send_msg( File "xx\Python\Python311\Lib\site-packages\homeassistant\components\telegram_bot\init__.py", line 620, in _send_msg out = func_send(*args_msg, *kwargs_msg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\decorator.py", line 232, in fun return caller(func, (extras + args), kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\bot.py", line 135, in decorator result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\bot.py", line 458, in send_message return self._message( # type: ignore[return-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\bot.py", line 281, in _message result = self._post(endpoint, data, timeout=timeout, api_kwargs=api_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\bot.py", line 245, in _post return self.request.post(f'{self.base_url}/{endpoint}', data=data, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\utils\request.py", line 349, in post result = self._request_wrapper( ^^^^^^^^^^^^^^^^^^^^^^ File "zz.homeassistant\deps\Python311\site-packages\telegram\utils\request.py", line 254, in _request_wrapper raise TimedOut() from error telegram.error.TimedOut: Timed out

StefanoGiu commented 1 year ago

I think because the raise is inside telegram\utils\request.py

amitfin commented 1 year ago

Something is wrong in the test environment:

This means that the above service call was not done by the service retry.call.

StefanoGiu commented 1 year ago

I was calling telegram_bot.send_message from a script and then calling with call.retry that script.... but I do understand that retry needs to receive the error from the caller itself and not from any sub task/sub processes, right?

I changed from:

              - service: retry.call
                data:
                   service: notify.{{notifier}}
                   title: >-
                         {{title}}
                   message: >-
                         {{message}}

to:

             - service: retry.call
               data:
                 service: telegram_bot.send_message
                 title: >-
                           {{title}}
                 message: >-
                           {{message}}

and then I modified the telegram_bot file in this way:

    senterror = False
    senterrormsg = ""
    try:

    except TelegramError as exc:
        _LOGGER.error(
            "%s: %s. Args: %s, kwargs: %s", msg_error, exc, args_msg, kwargs_msg
        )
        senterror = True
        senterrormsg = str(msg_error)
    if senterror:
        raise TypeError(senterrormsg)

This to avoid the double exception from the caller.

Now everything seems to work properly, but the only thing is that I need the raise in the Telegram Bot and therefore I see the stack trace in the home assistant log.

Is there another way to trigger retry rather than raise ?

I'd like to keep the log clean without the stack trace. Thanks

amitfin commented 1 year ago

Happy to hear that things are working as expected now.

Regarding the log entries: There is no way to propagate errors from a service call other than throwing an exception. See also the last paragraph in this comment. You can filter out log entries as described in the logger documentation. It won't have any impact on retry.call.

StefanoGiu commented 1 year ago

Got it, I think now that services are responding back, this should improve.

I tested telegram bot with messages/inline keyboard/photo and video stressing the number of messages sent and now retry seems to handle them properly... :) very good package you did!!!

Is there a way to keep the sequenciality of the events? e.g. if I sent 100 messages and number 12 fails, to wait until 12 is sent properly or it's the last attempt for 12 before calling the message 13...

consider that I'm using a script with queued mechanism to queue all the messages I send via telegram bot, so I could wait for example the retry service to have done before proceeding to the next execution (e.g. retry OK or retry made the last attempt).

I hope I was clear...

amitfin commented 1 year ago

Thanks!

The retries are done in the background and therefore there is no indication of completion (either on success or on failure). Note that a failure happens after ~1 minute (7 retries with exponential backoff), so it's possible to wait for that amount of time (or less if a lower number of retries is provided as an input parameter).

StefanoGiu commented 1 year ago

Thank you... I would not want to wait if not needed (e.g. all messages are ok).... so I'll keep it as it is... order might not be respected.

Thank you for your time.