GoogleCloudPlatform / testgrid

Apache License 2.0
193 stars 68 forks source link

Fix race on timer in util.queue. #1285

Closed michelle192837 closed 1 month ago

michelle192837 commented 1 month ago

We're hitting an issue occasionally where Tabulator will stall out, which seems to be caused by the queue getting stuck during 'sleep()', likely because the timer channel has already been pulled from in the second select case, and the wait on receiving a value from the timer channel after timer.Stop() fails will block indefinitely.

Instead, ensure that timer.Stop() is definitely called, but don't bother to drain or close the channel. (Note that timer.Stop() doesn't stop the channel, but it should be cleaned up due to being out of scope once sleep() completes).

google-oss-prow[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cjwagner, michelle192837

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/GoogleCloudPlatform/testgrid/blob/main/OWNERS)~~ [michelle192837] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment