This closes an internal Shopify ticket that suspects that CreateReleaseStatusesJob was failing in creating release statuses due to a potential Gtihub API error response. In actuality, bugsnag errors are related to ConcurrentJobError. Some other open errors are linked to a similar issue in CreateOnGithubJob, primarily involving temporary spikes and some 504 response errors.
In this PR, I decided to drop duplicate background jobs for CreateReleaseStatusesJob, and retry on server errors from Github if that is indeed the case.
Further thought process
These were some recommendations I made initially:
Option 1: Implement a retry mechanism for both CreateOnGithubJob and CreateReleaseStatusesJob upon receiving a 504 response.
Option 2: Schedule a periodic job to address ReleaseStatuses and CommitDeploymentStatuses with a nil github_id, ensuring that race conditions do not occur with other concurrently initiated jobs.
Option 1 seems like a good start pointing b/c it targets server errors on Github's end (including the 504 response, Octokit::GatewayTimeout) and is lower effort than Option 2; I initially attempted to target 504 only along with the existing errors but that isn't possible b/c such subclass doesn't exist. If this solution fails, Option 2 or another alternate solution can be tried in a follow up PR.
Request for reviewers
Are you against this code change? Would you recommend another solution?
TLDR
This closes an internal Shopify ticket that suspects that CreateReleaseStatusesJob was failing in creating release statuses due to a potential Gtihub API error response. In actuality, bugsnag errors are related to ConcurrentJobError. Some other open errors are linked to a similar issue in CreateOnGithubJob, primarily involving temporary spikes and some 504 response errors.
In this PR, I decided to drop duplicate background jobs for CreateReleaseStatusesJob, and retry on server errors from Github if that is indeed the case.
Further thought process
These were some recommendations I made initially: Option 1: Implement a retry mechanism for both CreateOnGithubJob and CreateReleaseStatusesJob upon receiving a 504 response.
Option 2: Schedule a periodic job to address ReleaseStatuses and CommitDeploymentStatuses with a nil github_id, ensuring that race conditions do not occur with other concurrently initiated jobs.
Option 1 seems like a good start pointing b/c it targets server errors on Github's end (including the 504 response, Octokit::GatewayTimeout) and is lower effort than Option 2; I initially attempted to target 504 only along with the existing errors but that isn't possible b/c such subclass doesn't exist. If this solution fails, Option 2 or another alternate solution can be tried in a follow up PR.
Request for reviewers
Are you against this code change? Would you recommend another solution?