Open josharian opened 7 years ago
In #21280 it tagged the issue as documentation
and when I removed it (it wasn't clear at that point if it was a doc issue or a problem requiring code changes) it tagged again as doc. I thought it was supposed to not add a tag that was previously removed (and supposed to not close an issue that was just re-opened).
Reproduced here--I just removed the Builders label on this issue and gopherbot put it right back.
(deleted)
That's try to minimize memes and GIFs in comments on bugs. There's a slack channel for that.
In #21280 it tagged the issue as
documentation
and when I removed it (it wasn't clear at that point if it was a doc issue or a problem requiring code changes) it tagged again as doc.
This also happened in #22302:
When I removed it once more about 45 minutes later, the issue didn't happen again:
I thought it was supposed to not add a tag that was previously removed (and supposed to not close an issue that was just re-opened).
That's right. I've checked the code, and it certainly has the code to not fight with humans. For example, for the labelDocumentationIssues
task, it's:
func (b *gopherbot) labelDocumentationIssues(ctx context.Context) error {
return b.gorepo.ForeachIssue(func(gi *maintner.GitHubIssue) error {
if gi.Closed || !isDocumentationTitle(gi.Title) || gi.HasLabel("Documentation") || gi.HasEvent("unlabeled") {
return nil
}
printIssue("label-documentation", gi)
if *dryRun {
return nil
}
return b.addLabel(ctx, gi, "Documentation")
})
}
The gi.HasEvent("unlabeled")
part is what makes it not apply the documentation label again if a human has unlabeled it already.
It does work the second time; the bot seems to fight at most once per issue/task.
It feels like a timing/race condition bug (not a Go race condition, but a general one). It might happen that the labelDocumentationIssues
task runs after the label was removed, just before it's aware of the "unlabeled" event has happened, so gi.HasEvent("unlabeled")
returns false... but the issue doesn't already have "documentation" label, so it thinks it should be added.
I'm not completely sure how it's possible that the documentation label is removed yet gi.HasEvent("unlabeled")
returns false; I would expect those 2 things to happen atomically.
Perhaps it's worth looking more into that direction; I just wanted to share what I know so far.
Looks like it happened to @griesemer in https://github.com/golang/go/issues/22695#issuecomment-347962624:
@shurcooL Not sure. I think I fat-fingered this one.
Yet again in #24875:
Yet another thing I noticed recently that I suspect is related.
In issue #24850, if you look at the issue events API, you can see gopherbot actually applied the "Proposal" and "Documentation" labels three times each. It wasn't fighting humans, but I suspect the underlying reason is similar: it might not receive the "label has already been applied" information quickly enough and ends up iterating over "this label should be applied on this issue because it's not there yet" code 3 times before finally seeing that the label has been applied.
It's not visible in GitHub's UI because it collapses nearby "apply label" events into one, but effectively what happened was this (in the span of 2 seconds):
I reported this to GitHub separately (i.e., it should probably be disallowed at API level to apply the same label more than once in a row). But gopherbot shouldn't be trying to do that either.
(One hacky idea for a fix is to add some delays between gopherbot taking actions, so it's slower (like a human) and has less chances of making repeat actions it shouldn't. But of course that's not as nice as a more proper that doesn't doesn't depend on timing.)
I filed #27058 but I'm closing it as a dupe of this. In the issue referenced there, I did a "gopherbot please remove label documentation" and then saw
gopherbot added Documentation and removed Documentation labels
(indicating that the label was removed and then immediately re-added) presumably because the issue title contained the word "documentation" at that time.
Another: #26623
Another: #28103. I was expecting it to happen, so I fired up a local copy of gopherbot
running in dry-mode and watched it fight me (one time), as well as looked at gopherbot
and maintnerd
logs at the same time.
I have a good grasp of the underlying cause now. The problem is that maintner
provides an inconsistent world-view during very short periods of time, and gopherbot
takes (incorrect) action during those very short periods of time. Edit: Filed #28226 for this. /cc @bradfitz
It's as I guessed in https://github.com/golang/go/issues/21312#issuecomment-341338223, except now I have data to back up that hypothesis. gopherbot
runs in a loop, most of the time blocked in a corpus.Update
call. As soon as that call returns after receiving new data, gopherbot
runs all of its tasks.
Here's a log of gopherbot
activity with annotations (the mutations that are received by the maintner
client):
2018/10/15 14:46:12 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:46 Downloading 47 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:46 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:46 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:46 got corpus update after 34.209702362s
mutation: github_issue:<owner:"golang" repo:"go" number:28103 updated:<seconds:1539629204 > remove_label:223401461 >
aka "the issue 28103 no longer has "Builders" label
but the "a human has unlabeled this issue" mutation isn't here yet, so it applies the Builders label...
label-Builders [dry-run]
https://golang.org/issue/28103 x/build/version: add $GOROOT/bin to $PATH when running go run/generate
2018/10/15 14:46:46 gopherbot ran in 8.874186ms
2018/10/15 14:46:46 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:49 Downloading 42 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:49 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:49 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:49 got corpus update after 2.476171757s
mutation: github_issue:<owner:"golang" repo:"go" number:28103 comment_status:<server_date:<seconds:1539629209 > > >
some internal info about server time of latest comments
... still haven't received the "a human has unlabeled this issue" mutation yet, so...
label-Builders [dry-run]
https://golang.org/issue/28103 x/build/version: add $GOROOT/bin to $PATH when running go run/generate
2018/10/15 14:46:49 gopherbot ran in 9.168746ms
2018/10/15 14:46:49 Updating data from log *maintner.netMutSource ...
2018/10/15 14:46:49 Downloading 130 bytes of https://maintner.golang.org/logs/41 ...
2018/10/15 14:46:49 wrote /Users/dmitshur/Library/Caches/golang-maintner/0041.growing.mutlog
2018/10/15 14:46:49 Reloaded data from log *maintner.netMutSource.
2018/10/15 14:46:49 got corpus update after 154.471501ms
mutation: github_issue:<owner:"golang" repo:"go" number:28103 event:<id:1904921842 event_type:"unlabeled" actor_id:1924134 created:<seconds:1539629204 > label:<name:"Builders" > > event:<id:1904921913 event_type:"labeled" actor_id:8566911 created:<seconds:1539629206 > label:<name:"Builders" > > event_status:<server_date:<seconds:1539629209 > > >
finally it received the "there was an unlabel event" mutation, so gopherbot knows it shouldn't
fight humans, hence it no longer tries to apply the "Builders" label to the issue
2018/10/15 14:46:49 gopherbot ran in 8.387027ms
2018/10/15 14:46:49 Updating data from log *maintner.netMutSource ...
What happens is the mutation for "this issue no longer has a label [that it should have according to the rules]" comes in a few seconds before the "there was an unlabel event for this issue [because a human has unlabeled it]" mutation. gopherbot
acts immediately after receiving the first mutation and re-applies the label. (It tries to apply the Builders label after second mutation too, but it does nothing since the label is already applied.)
I'm going to file a maintner
issue for the underlying problem (edit: opened #28226), but I propose the following workaround for gopherbot
specifically until then: give it a slower (gopher-like) reaction time, rather than having it garden issues instantaneously. A 5 second buffer should make it avoid acting during a short inconsistent maintner
view of the world.
I'm going to test this approach and confirm it works as expected.
I tested the approach (locally) and it worked in solving the problem. Going to send a CL. Edit: Sent CL 142362.
Change https://golang.org/cl/142362 mentions this issue: cmd/gopherbot: reduce gardening reaction time
Starting to have second thoughts about the proposed workaround. In the CL 142362 commit message, I wrote:
The goal of this change is to reduce occurrences of gopherbot fighting people (issue golang/go#21312, which causes confusion and is distracting), without introducing serious drawbacks. A slightly slower reaction time for gardening is more human-like (or gopher-like) and should not cause problems.
It's true that a slower reaction time for gardening new issues is unlikely to be problematic. However, I've realized gopherbot
is often used in other ways, where it being very responsive is important.
For example, I notice people sometimes apply the NeedsFix
label and rely on gopherbot to remove the NeedsDecision
label. Or when asking gopherbot to backport issues.
If gopherbot ends up taking 15-30 seconds to respond (conceivable if there's a lot of continuous corpus activity), it may seem that it's broken or lagging, and as a result quite detrimental to the user experience and the ability to rely on it.
@dmitshur what if gopherbot immediately added itself as an assignee of the issue, but didn’t take action on it for 60sec, and then removed itself as an assignee. Then there’d be immediate feedback “gopherbot is listening!”.
There's another documentation label fight at https://github.com/golang/go/issues/29127.
Change https://golang.org/cl/183624 mentions this issue: cmd/gopherbot: don't fight over NeedsDecision removal
In #20892, I re-opened an issue, only to have @gopherbot re-close it. This is an issue to follow-up on what happened there. cc @andybons @kevinburke