ref(bot): split ingest and workers into two processes

sbdchd commented 2 years ago

The http server accepts GitHub webhook events and places them on a queue for the worker process to handle.

For self-hosting users we use supervisord to run both processes within the Docker container.

We now have one ingest, one webhook, and one repo queue per installation. This way we limit the number of API calls we're attempting to make per installation at any time so we can obey the GitHub rate limit without hitting our internal timeouts.

There's a new config var, INGEST_QUEUE_LENGTH (default 1,000) to control the max, per-installation ingest queue size.

In our worker process we now periodically check the status of our running tasks to restart any failed tasks. Previously we would only check if a task failed when we enqueued a new pull request for the related queue.

Previously every incoming webhook could trigger a wait on the async throttle for the GitHub API. This could starve out the webhook queue and repo queue workers, causing internal timeouts.

chdsbd commented 2 years ago

I've deployed this to production using two docker instances, instead of using the systemd setup. I think self-hosting users still need the systemd setup, but we need to get the child workers logging to stdout.

chdsbd commented 2 years ago

We need the ingest queue to be per installation

chdsbd commented 2 years ago

We kind of need to have one ingest worker, one webhook worker, one merge queue worker per installation to fairly distribute work. Right now we have a large pool of ingest workers, which means if a single installation sends a lot of requests, those web hooks will be prioritized over web hook and merge queue workers.

chdsbd / kodiak

ref(bot): split ingest and workers into two processes #744