Shopify / shipit-engine

Deployment coordination
https://shopify.engineering/introducing-shipit
MIT License
1.41k stars 145 forks source link

Random 401 errors in workers #924

Closed josacar closed 4 years ago

josacar commented 5 years ago

Hi, we are experiencing random 401 Unauthorized errors from Github API randomly.

This is happening since we moved from Github API token to a GitHub Application.

Maybe this is related to this issue: https://github.com/octokit/octokit.rb/issues/1096

GET https://api.github.com/repos/xxx/yyy/commits/f2af22a9cb70c472263bafdd0d4a690f2613f800/check-runs: 401 - Bad credentials // See: https://developer.github.com/v3

and payload:

{
  "context" => "Job raised exception",
  "job" => {
    "args" => [{"job_class"=>"Shipit::RefreshCheckRunsJob", "job_id"=>"18756953-e915-41c7-bcee-41f8383bbd1f", "provider_job_id"=>nil, "queue_name"=>"default", "priority"=>nil, "arguments"=>[{"stack_id"=>1, "_aj_symbol_keys"=>["stack_id"]}], "executions"=>0, "locale"=>"en"}],
    "class" => "ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper",
    "created_at" => 1563901633.7204895,
    "enqueued_at" => 1563901633.7205267,
    "jid" => "e56e82b9cb9145b2e30fbb0c",
    "queue" => "default",
    "retry" => true,
    "wrapped" => "Shipit::RefreshCheckRunsJob"
  },
  "jobstr" => "{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"Shipit::RefreshCheckRunsJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"Shipit::RefreshCheckRunsJob\",\"job_id\":\"18756953-e915-41c7-bcee-41f8383bbd1f\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[{\"stack_id\":1,\"_aj_symbol_keys\":[\"stack_id\"]}],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"e56e82b9cb9145b2e30fbb0c\",\"created_at\":1563901633.7204895,\"enqueued_at\":1563901633.7205267}"
}

This is another example:

GET https://api.github.com/repos/xxx/yyy/commits/a3fd480a4c4342d1093b9f1388227d73c26ba274/check-runs: 401 - Bad credentials // See: https://developer.github.com/v3

With this job payload:

{
  "context" => "Job raised exception",
  "job" => {
    "args" => [{"job_class"=>"Shipit::RefreshCheckRunsJob", "job_id"=>"9998eeab-45ca-45bf-bd75-698e12bfb095", "provider_job_id"=>nil, "queue_name"=>"default", "priority"=>nil, "arguments"=>[{"stack_id"=>1, "_aj_symbol_keys"=>["stack_id"]}], "executions"=>0, "locale"=>"en"}],
    "class" => "ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper",
    "created_at" => 1563898033.501681,
    "enqueued_at" => 1563898033.501717,
    "jid" => "f57cf4ef848b57dc8e050620",
    "queue" => "default",
    "retry" => true,
    "wrapped" => "Shipit::RefreshCheckRunsJob"
  },
  "jobstr" => "{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"Shipit::RefreshCheckRunsJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"Shipit::RefreshCheckRunsJob\",\"job_id\":\"9998eeab-45ca-45bf-bd75-698e12bfb095\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[{\"stack_id\":1,\"_aj_symbol_keys\":[\"stack_id\"]}],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"f57cf4ef848b57dc8e050620\",\"created_at\":1563898033.501681,\"enqueued_at\":1563898033.501717}"
}

This is another example:

GET https://api.github.com/repos/xxx/yyy/statuses/69a439c93b7ef08ef2664d0a2f39aad6c4ee8354: 401 - Bad credentials // See: https://developer.github.com/v3

With this job payload:

{
  "context" => "Job raised exception",
  "job" => {
    "args" => [{"job_class"=>"Shipit::RefreshStatusesJob", "job_id"=>"ef7a6dbe-f470-4ac7-afe9-f69f21d73a99", "provider_job_id"=>nil, "queue_name"=>"default", "priority"=>nil, "arguments"=>[{"stack_id"=>26, "_aj_symbol_keys"=>["stack_id"]}], "executions"=>0, "locale"=>"en"}],
    "class" => "ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper",
    "created_at" => 1563902892.8128026,
    "enqueued_at" => 1563902892.8128335,
    "jid" => "6f13238841fd863eb277e9ee",
    "queue" => "default",
    "retry" => true,
    "wrapped" => "Shipit::RefreshStatusesJob"
  },
  "jobstr" => "{\"class\":\"ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper\",\"wrapped\":\"Shipit::RefreshStatusesJob\",\"queue\":\"default\",\"args\":[{\"job_class\":\"Shipit::RefreshStatusesJob\",\"job_id\":\"ef7a6dbe-f470-4ac7-afe9-f69f21d73a99\",\"provider_job_id\":null,\"queue_name\":\"default\",\"priority\":null,\"arguments\":[{\"stack_id\":26,\"_aj_symbol_keys\":[\"stack_id\"]}],\"executions\":0,\"locale\":\"en\"}],\"retry\":true,\"jid\":\"6f13238841fd863eb277e9ee\",\"created_at\":1563902892.8128026,\"enqueued_at\":1563902892.8128335}"
}
casperisfine commented 5 years ago

Hum. how random is it? Is it all calls past a certain time until you restart? or X% of calls ? etc.

Maybe this is related to this issue: octokit/octokit.rb#1096

I doubt so as Shipit include code to handle token expiration.

https://github.com/Shopify/shipit-engine/blob/a16258f15372e215272189929849eb4129c0ea16/lib/shipit/github_app.rb#L22

josacar commented 5 years ago

We have 7 failed requests from 406 requests last 24 hours.

Screenshot 2019-07-25 at 14 02 56

This is the histogram of the failed requests, maybe it can be related to the token expired in the meanwhile of the job, I'm pretty lost.

casperisfine commented 5 years ago

Yeah, we could probably consider the token as expired a couple minutes before the claimed expiry. This would protect against shifting system clocks & such.

Are you will to deploy a branch of Shipit-engine to test a fix, or maybe just to monkey patch the fix in?

casperisfine commented 5 years ago
module ShipitExpiryPatch
  def initialize(token, expires_at)
    super(token, expires_at - 5.minutes)
  end
end

Shipit::GitHubApp::Token.prepend(ShipitExpiryPatch)

The above patch dropped into config/initializers should allow to confirm wether it's an expiry race or not.

josacar commented 5 years ago

Ok, applied, let's see tomorrow.

josacar commented 5 years ago

Only had one error during the night, I'm increasing it to 15 minutes to see if it works.

ayatsynych commented 4 years ago

Fixed in https://github.com/Shopify/shipit-engine/pull/951