golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.37k stars 17.71k forks source link

x/build/cmd/gopherbot: don't fight humans #21312

Open josharian opened 7 years ago

josharian commented 7 years ago

In #20892, I re-opened an issue, only to have @gopherbot re-close it. This is an issue to follow-up on what happened there. cc @andybons @kevinburke

ALTree commented 7 years ago

In #21280 it tagged the issue as documentation and when I removed it (it wasn't clear at that point if it was a doc issue or a problem requiring code changes) it tagged again as doc. I thought it was supposed to not add a tag that was previously removed (and supposed to not close an issue that was just re-opened).

josharian commented 7 years ago

Reproduced here--I just removed the Builders label on this issue and gopherbot put it right back.

kevinburke commented 7 years ago

(deleted)

bradfitz commented 7 years ago

That's try to minimize memes and GIFs in comments on bugs. There's a slack channel for that.

dmitshur commented 7 years ago

In #21280 it tagged the issue as documentation and when I removed it (it wasn't clear at that point if it was a doc issue or a problem requiring code changes) it tagged again as doc.

This also happened in #22302:

image

When I removed it once more about 45 minutes later, the issue didn't happen again:

image

I thought it was supposed to not add a tag that was previously removed (and supposed to not close an issue that was just re-opened).

That's right. I've checked the code, and it certainly has the code to not fight with humans. For example, for the labelDocumentationIssues task, it's:

func (b *gopherbot) labelDocumentationIssues(ctx context.Context) error {
    return b.gorepo.ForeachIssue(func(gi *maintner.GitHubIssue) error {
        if gi.Closed || !isDocumentationTitle(gi.Title) || gi.HasLabel("Documentation") || gi.HasEvent("unlabeled") {
            return nil
        }
        printIssue("label-documentation", gi)
        if *dryRun {
            return nil
        }
        return b.addLabel(ctx, gi, "Documentation")
    })
}

The gi.HasEvent("unlabeled") part is what makes it not apply the documentation label again if a human has unlabeled it already.

It does work the second time; the bot seems to fight at most once per issue/task.

It feels like a timing/race condition bug (not a Go race condition, but a general one). It might happen that the labelDocumentationIssues task runs after the label was removed, just before it's aware of the "unlabeled" event has happened, so gi.HasEvent("unlabeled") returns false... but the issue doesn't already have "documentation" label, so it thinks it should be added.

I'm not completely sure how it's possible that the documentation label is removed yet gi.HasEvent("unlabeled") returns false; I would expect those 2 things to happen atomically.

Perhaps it's worth looking more into that direction; I just wanted to share what I know so far.

dmitshur commented 7 years ago

Looks like it happened to @griesemer in https://github.com/golang/go/issues/22695#issuecomment-347962624:

image

griesemer commented 7 years ago

@shurcooL Not sure. I think I fat-fingered this one.

dmitshur commented 6 years ago

Yet again in #24875:

image

dmitshur commented 6 years ago

Yet another thing I noticed recently that I suspect is related.

In issue #24850, if you look at the issue events API, you can see gopherbot actually applied the "Proposal" and "Documentation" labels three times each. It wasn't fighting humans, but I suspect the underlying reason is similar: it might not receive the "label has already been applied" information quickly enough and ends up iterating over "this label should be applied on this issue because it's not there yet" code 3 times before finally seeing that the label has been applied.

It's not visible in GitHub's UI because it collapses nearby "apply label" events into one, but effectively what happened was this (in the span of 2 seconds):

image

I reported this to GitHub separately (i.e., it should probably be disallowed at API level to apply the same label more than once in a row). But gopherbot shouldn't be trying to do that either.

(One hacky idea for a fix is to add some delays between gopherbot taking actions, so it's slower (like a human) and has less chances of making repeat actions it shouldn't. But of course that's not as nice as a more proper that doesn't doesn't depend on timing.)

mark-rushakoff commented 6 years ago

I filed #27058 but I'm closing it as a dupe of this. In the issue referenced there, I did a "gopherbot please remove label documentation" and then saw

gopherbot added Documentation and removed Documentation labels

(indicating that the label was removed and then immediately re-added) presumably because the issue title contained the word "documentation" at that time.

josharian commented 6 years ago

Another: #26623

dmitshur commented 6 years ago

Another: #28103. I was expecting it to happen, so I fired up a local copy of gopherbot running in dry-mode and watched it fight me (one time), as well as looked at gopherbot and maintnerd logs at the same time.

I have a good grasp of the underlying cause now. The problem is that maintner provides an inconsistent world-view during very short periods of time, and gopherbot takes (incorrect) action during those very short periods of time. Edit: Filed #28226 for this. /cc @bradfitz

It's as I guessed in https://github.com/golang/go/issues/21312#issuecomment-341338223, except now I have data to back up that hypothesis. gopherbot runs in a loop, most of the time blocked in a corpus.Update call. As soon as that call returns after receiving new data, gopherbot runs all of its tasks.

Here's a log of gopherbot activity with annotations (the mutations that are received by the maintner client):

2018/10/15 14:46:12 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:46 Downloading 47 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:46 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:46 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:46 got corpus update after 34.209702362s
    mutation: github_issue:<owner:"golang" repo:"go" number:28103 updated:<seconds:1539629204 > remove_label:223401461 > 

              aka "the issue 28103 no longer has "Builders" label
              but the "a human has unlabeled this issue" mutation isn't here yet, so it applies the Builders label...
label-Builders [dry-run]
    https://golang.org/issue/28103  x/build/version: add $GOROOT/bin to $PATH when running go run/generate
2018/10/15 14:46:46 gopherbot ran in 8.874186ms
2018/10/15 14:46:46 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:49 Downloading 42 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:49 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:49 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:49 got corpus update after 2.476171757s
    mutation: github_issue:<owner:"golang" repo:"go" number:28103 comment_status:<server_date:<seconds:1539629209 > > > 

              some internal info about server time of latest comments
              ... still haven't received the "a human has unlabeled this issue" mutation yet, so...
label-Builders [dry-run]
    https://golang.org/issue/28103  x/build/version: add $GOROOT/bin to $PATH when running go run/generate
2018/10/15 14:46:49 gopherbot ran in 9.168746ms
2018/10/15 14:46:49 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:49 Downloading 130 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:49 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:49 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:49 got corpus update after 154.471501ms
    mutation: github_issue:<owner:"golang" repo:"go" number:28103 event:<id:1904921842 event_type:"unlabeled" actor_id:1924134 created:<seconds:1539629204 > label:<name:"Builders" > > event:<id:1904921913 event_type:"labeled" actor_id:8566911 created:<seconds:1539629206 > label:<name:"Builders" > > event_status:<server_date:<seconds:1539629209 > > > 

              finally it received the "there was an unlabel event" mutation, so gopherbot knows it shouldn't
              fight humans, hence it no longer tries to apply the "Builders" label to the issue
2018/10/15 14:46:49 gopherbot ran in 8.387027ms
2018/10/15 14:46:49 Updating data from log *maintner.netMutSource ...

What happens is the mutation for "this issue no longer has a label [that it should have according to the rules]" comes in a few seconds before the "there was an unlabel event for this issue [because a human has unlabeled it]" mutation. gopherbot acts immediately after receiving the first mutation and re-applies the label. (It tries to apply the Builders label after second mutation too, but it does nothing since the label is already applied.)

Suggested Workaround

I'm going to file a maintner issue for the underlying problem (edit: opened #28226), but I propose the following workaround for gopherbot specifically until then: give it a slower (gopher-like) reaction time, rather than having it garden issues instantaneously. A 5 second buffer should make it avoid acting during a short inconsistent maintner view of the world.

I'm going to test this approach and confirm it works as expected.

dmitshur commented 6 years ago

I tested the approach (locally) and it worked in solving the problem. Going to send a CL. Edit: Sent CL 142362.

gopherbot commented 6 years ago

Change https://golang.org/cl/142362 mentions this issue: cmd/gopherbot: reduce gardening reaction time

dmitshur commented 6 years ago

Starting to have second thoughts about the proposed workaround. In the CL 142362 commit message, I wrote:

The goal of this change is to reduce occurrences of gopherbot fighting people (issue golang/go#21312, which causes confusion and is distracting), without introducing serious drawbacks. A slightly slower reaction time for gardening is more human-like (or gopher-like) and should not cause problems.

It's true that a slower reaction time for gardening new issues is unlikely to be problematic. However, I've realized gopherbot is often used in other ways, where it being very responsive is important.

For example, I notice people sometimes apply the NeedsFix label and rely on gopherbot to remove the NeedsDecision label. Or when asking gopherbot to backport issues.

image

If gopherbot ends up taking 15-30 seconds to respond (conceivable if there's a lot of continuous corpus activity), it may seem that it's broken or lagging, and as a result quite detrimental to the user experience and the ability to rely on it.

josharian commented 6 years ago

@dmitshur what if gopherbot immediately added itself as an assignee of the issue, but didn’t take action on it for 60sec, and then removed itself as an assignee. Then there’d be immediate feedback “gopherbot is listening!”.

mark-rushakoff commented 5 years ago

There's another documentation label fight at https://github.com/golang/go/issues/29127.

gopherbot commented 5 years ago

Change https://golang.org/cl/183624 mentions this issue: cmd/gopherbot: don't fight over NeedsDecision removal