golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.23k stars 17.37k forks source link

x/build/maintner: reports inconsistent world state (e.g., issue state vs issue events) during short windows of time #28226

Open dmitshur opened 5 years ago

dmitshur commented 5 years ago

Problem

A program that fetches a maintner corpus and tries to use its data to make decisions may make a mistake, because the world view is inconsistent during short windows of time. Even though the windows are short, it's guaranteed to happen for any daemon that loops over doing corpus updates and making decisions immediately after.

The most visible high-level example of this is #21312.

Cause

This happens because there are effectively two GitHub data sources that are not synchronized:

  1. changes to GitHub state (e.g., issue N now has labels X, Y, Z)
  2. GitHub-generated events (e.g., issue N has had an "unlabeled" event)

To give a concrete example of an inconsistent state that maintner can report, consider when an issue has just been unlabeled. The first mutation received and processed by a corpus.Update call will be that the issue no longer has that label.

The mutation reporting that there has been an unlabeled event on the same issue may come in a few seconds later. Until it does, it will appear that the issue does not have said label and it has never been unlabeled (e.g., !gi.HasLabel("Documentation") && !gi.HasEvent("unlabeled") will be true). Which is not the reality (if one considers the reality to be one where the unlabeled event and its effect to happen simultaneously).

Details

These are two distinct mutations received and processed by corpus.Update method:

received mutation at time t0:
github_issue: <
  owner: "golang"
  repo: "go"
  number: 28103
  updated: <
    seconds: 1539629204
  >
  remove_label: 223401461
>

... (short window during which the issue doesn't have a label,
     but the accompanying "unlabeled" event hasn't been received yet;
     aka an inconsistent world state)

received mutation at time t1:
github_issue: <
  owner: "golang"
  repo: "go"
  number: 28103
  event: <
    id: 1904921842
    event_type: "unlabeled"
    actor_id: 1924134
    created: <
      seconds: 1539629204
    >
    label: <
      name: "Builders"
    >
  >
  event: <
    id: 1904921913
    event_type: "labeled"
    actor_id: 8566911
    created: <
      seconds: 1539629206
    >
    label: <
      name: "Builders"
    >
  >
  event_status: <
    server_date: <
      seconds: 1539629209
    >
  >
>

There is more relevant information in https://github.com/golang/go/issues/21312#issuecomment-430051456.

/cc @bradfitz

gopherbot commented 5 years ago

Change https://golang.org/cl/142362 mentions this issue: cmd/gopherbot: reduce gardening reaction time

orthros commented 5 years ago

When working on other issues, I saw that GitHub introduced a "unified" timeline for events on an issue, the Timeline Api. I understand that it is still in beta (since 2016) and would be a major, but it might help fix this issue by providing a single source of truth on a GitHubIssue

dmitshur commented 5 years ago

@orthros Thanks for pointing that out. The Timeline API can indeed be helpful for eliminating races between issue comments, events, and PR reviews (for #21086).

Something to be mindful of is that it may not, on its own, be enough to solve the most important race: between the issue state (whether it's open or closed, which labels it has applied) and events. Unless we use the events to deduct the state, rather than querying state separately. (But that can be done independently of using the Timeline API.)

Also, for information, the Timeline API is indeed in preview, and in my experience using it, it had some data gap edge cases where I had to fall back to querying reviews separately (e.g., see here). It may have been resolved by now, but it's worth being aware of. It seems there are 2 Timeline APIs in GitHub API v4 (PullRequestTimelineConnection and PullRequestTimelineItemsConnection, the latter being a part of a preview API), in addition to the Timeline API in GitHub API v3 (https://developer.github.com/v3/issues/timeline/).

andybons commented 5 years ago

It’s not just short windows of time. There are some issues that have events missing within the maintner corpus. This makes it impossible to create an accurate milestone burndown chart where you want to query for the state of an issue at a particular time window. (/cc @griesemer).

A few examples of issues in maintner that have incomplete event lists:

=== Issue events for golang.org/issues/28559
             labeled    milestone:          label:Testing
             labeled    milestone:          label:help wanted
             labeled    milestone:          label:OS-OpenBSD
             labeled    milestone:          label:Builders
             labeled    milestone:          label:NeedsInvestigation
          milestoned    milestone:   Go1.12 label:

It does not record the final “closed” event: https://api.github.com/repos/golang/go/issues/28559/events

=== Issue events for golang.org/issues/28306
           mentioned    milestone:          label:
          subscribed    milestone:          label:
           mentioned    milestone:          label:
          subscribed    milestone:          label:
            assigned    milestone:          label:
             labeled    milestone:          label:Documentation
             labeled    milestone:          label:NeedsInvestigation
          milestoned    milestone:   Go1.12 label:
             renamed    milestone:          label:

The above event log is missing a few milestone-related events: https://api.github.com/repos/golang/go/issues/28306/events

dmitshur commented 5 years ago

@andybons That sounds like a valid issue that is related, but not the same as this one. I see these two issues:

  1. short windows of time where world state is incorrect due to separate sources of data not being synchronized (the issue described in the original report)
  2. some issue events are permanently missing (issue you described)

Mind opening a separate issue for it? The reason I suggest that is because I expect the fix for one will not resolve the other, and vice versa. Thanks!