chdsbd / kodiak

🔮 A bot to automatically update and merge GitHub PRs
https://kodiakhq.com
GNU Affero General Public License v3.0
1.03k stars 65 forks source link

ref(bot): robust redis connection pool #836

Closed sbdchd closed 1 year ago

sbdchd commented 1 year ago

Sometimes the connection to the redis host can be severed for a little bit resulting in a ConnectionLostError error.

Ideally the redis client would handle these more gracefully, but as a work around we can send a "canary" request to redis to see if the connection is dead and have it create a new one if that's the case.

This will increase our RPS to redis, but they're only ping requests so it should be pretty fast.

There are some other redis connections that aren't wrapped, but when those connections fail the entire task restarts, so it isn't a big deal. This RobustPool is intended for the nitty gritty APIs that mutate state and can't safely be retried -- maybe they can but more involved.

sbdchd commented 1 year ago

The test is hanging, something to do with redis:

+ ./.venv/bin/pytest kodiak/test_queue.py
=================================== test session starts ===================================
platform darwin -- Python 3.7.13, pytest-6.0.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/steve/projects/kodiak/bot, configfile: tox.ini
plugins: cov-2.12.1, anyio-3.6.1, asyncio-0.12.0, kodiak-0.1.0, mock-3.3.1
collected 6 items                                                                         

kodiak/test_queue.py .....

happens even if I remove the mock

sbdchd commented 1 year ago

We could also switch to redis-py which has proper health checks built into the client

sbdchd commented 1 year ago

ConnectionResetError -> https://docs.python.org/3/library/errno.html#errno.ECONNRESET

And ECONNRESET is caused by the server, redis in this case.

So we should investigate on the redis host:

While probably not related, we might as well fix the warning about huge pages:

https://github.com/redis/redis/issues/3176