finos / symphony-bdk-python

Symphony Python Bot Development Kit (BDK)
https://symphony-bdk-python.finos.org/
Apache License 2.0
31 stars 34 forks source link

BDK Crashes occasionally, stack trace attached. #256

Closed dky closed 2 years ago

dky commented 2 years ago

Hey guys, we regularly receive exceptions when running the bot. We are using the bot framework 2.0.1. I've just bumped us up to 2.1.0 but curious if anyone had insight what this stack indicates.

Traceback (most recent call last):
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/dky/git/bot/src/__main__.py", line 131, in <module>
    asyncio.run(run())
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/dky/git/bot/src/__main__.py", line 119, in run
    await datafeed_loop.start()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/abstract_datafeed_loop.py", line 99, in start
    await self._run_loop()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/abstract_datafeed_loop.py", line 149, in _run_loop
    await self._run_loop_iteration()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/datafeed_loop_v1.py", line 54, in _run_loop_iteration
    events = await self._read_datafeed()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 118, in async_wrapped
    return await fn(*args, **kwargs)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 80, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 41, in iter
    should_retry = await self.retry(retry_state=retry_state)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/strategy.py", line 119, in read_datafeed_retry
    raise exception
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 83, in __call__
    result = await fn(*args, **kwargs)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/datafeed_loop_v1.py", line 63, in _read_datafeed
    events = await self._datafeed_api.v4_datafeed_id_read_get(id=self._datafeed_id,
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/client/trace_id.py", line 46, in add_x_trace_id_header
    return await func(*args, **kwargs)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/gen/api_client.py", line 195, in __call_api
    response_data = await self.request(
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/gen/rest.py", line 190, in GET
    return await self.request("GET", url,
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/gen/rest.py", line 165, in request
    r = await self.pool_manager.request(**args)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 913, in start
    self._continue = None
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/aiohttp/helpers.py", line 718, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
dky commented 2 years ago

This looks like an issue with our Symphony pod going offline or having connection issues (Observed the same blip today). Is there any way we can add error handling to make sure the bot reconnects vs sitting on TimeoutError?

symphony-youri commented 2 years ago

Hi @dky

Could you confirm that the bot is indeed stopping and not recovering from the error? Normally we should have a retry policy in place and the datafeed loop should not stop.

dky commented 2 years ago

@symphony-youri Yes, it hangs indefinitely and our users get super frustrated. The only way to recover is ctrl-c to break out of the loop and re-run the bot. All open to anything you would need on my end to see why the retry is failing. This happened to us just this Friday when our Pod got rebooted or something.

symphony-youri commented 2 years ago

It looks like the retry logic is just wrong and not catching the proper errors. I opened #260 to address that.

dky commented 2 years ago

Thanks! Hope to see fix released soon and thanks for the help!

symphony-youri commented 2 years ago

We will have to release a 2.2.1 with this change, it should be there shortly, hopefully by the end of the week.