chdsbd / kodiak

🔮 A bot to automatically update and merge GitHub PRs
https://kodiakhq.com
GNU Affero General Public License v3.0
1.04k stars 63 forks source link

Kodiak appears to get stuck if first PR in the queue fails checks #586

Open gregplaysguitar opened 3 years ago

gregplaysguitar commented 3 years ago

We have a recurring situation where kodiak seems to get stuck, and we get a backlog of PRs queued for merge, but no PR first in the queue. My guess is that it goes like this

Naively it seems like it's only looking for the PR marked as first, rather than the one with with lowest number.

Sorry I can't give more concrete detail on this, it's an intermittent issue so hard to pin down exactly.

We have branch protection on, with one review required - here's our config

version = 1

[merge]
method = "squash"
delete_branch_on_merge = true
prioritize_ready_to_merge = true
notify_on_conflict = false

[merge.message]
title = "pull_request_title"
body = "pull_request_body"
include_pr_number = true
body_type = "markdown"
gregplaysguitar commented 3 years ago

Update: I removed the automerge label from all open PRs, including some which were failing tests so not showing as enqueued, then I added it back to a passing PR, and it was updated as expected. So it does look like Kodiak considered one of my failing PRs to be first in line, and that was blocking the queue

chdsbd commented 3 years ago

Is this problem a recent issue or something that's been happening for a while?

Also, is this possibly related to #585?

As a quick check, if Kodiak seems stuck, editing the PR (description, title, labels, commits, etc.) should trigger Kodiak to evaluate the PR again.

For some context on Kodiak's behavior, here's the basic flow,

  1. automerge label is added to a PR
  2. Kodiak checks if the PR should be queued for merge
  3. Kodiak adds PR to merge queue and starts a task to pull from that queue (if there isn't a task already running).
  4. that task removes a PR from the queue and evaluates it for merge.
  5. if PR is good to merge, Kodiak will update the PR if necessary, wait until the PR is mergeable, and then merge the PR
  6. if the PR isn't able to be merged, Kodiak removes the PR from the queue and pulls the next item off the queue (back to step 4.)

The queue position information in GitHub is updated when a PR is evaluated. When Kodiak merges a PR, we recheck all the PRs against that branch and the position information updates in GitHub.

I found some recent (6:23PM EST) timeout errors in Sentry for your installation, but Kodiak should retry after the timeout.

https://github.com/chdsbd/kodiak/blob/e663f1086ebb305a0a5ddce1d21c6d476e42f41e/bot/kodiak/pull_request.py#L141-L144

gregplaysguitar commented 3 years ago

Thanks for the info - it is something that's happened on and off for a while, but I've just manually merged things until it kicks in again; today I played around for a bit trying to work out what was going on. Also, I think it started prior to that outage, but can't be certain - Kodiak was definitely working though when I observed this behaviour, because if I updated the PR manually, then it'd get merged (due to prioritize_ready_to_merge)

Anyway - we are back up and running now, and I'll try editing the description if I notice this again. It doesn't happen enough to really impact us, but does seem a regular thing. I'll let you know if I notice any more patterns

chdsbd commented 3 years ago

Not sure this helps, but Kodiak does a daily restart at 11:00PM EST. Is that correlated with any behavior you've seen?

Kodiak should restart gracefully and continue working the merge queue.

gregplaysguitar commented 3 years ago

Not sure, but I'll bear that in mind if I see it happen again - thanks!