danger / peril

☢️ Serious and immediate danger.
https://danger.systems
MIT License
461 stars 58 forks source link

Missing Lambda Invocation #441

Closed ashfurrow closed 5 years ago

ashfurrow commented 5 years ago

A set of tasks was scheduled to run on Artsy's Peril install this morning but didn't. A search through Cloudwatch for "monday-informationals" yielded only our manual invocation (and not the 9am EST one). No exception info in our Slack invoicing webhooks channel. So it seems like maybe the Lambda invocation failed, but wasn't reported on? Very happy to look into this if you point me in the right general direction.

ashfurrow commented 5 years ago

@zephraph looked and found some Zeit logs that seem to indicate that the Peril server itself was restarted:

info: ## pull_request.synchronize on CocoaPods on CocoaPods/Specs

info:    1 run needed: cocoapods/peril-settings@org/pr.ts

Request failed [404]: https://api.github.com/repos/CocoaPods/Specs/pulls/14419
Response: {
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3/pulls/#get-a-single-pull-request"
}

error: UnhandledRejection Error: 
          Could not find pull request information,
          if you are using a private repo then perhaps
          Danger does not have permission to access that repo.
info: ☢️  Starting up Peril

(That PR does not exist.) So could it be that Peril having encountered an exception, requires a reboot? And that maybe tasks scheduled during that time are getting dropped? Justin recommended adding a handler for promise rejections – if that sounds like a good idea, let me know and I'll open a PR 👍

orta commented 5 years ago

Yeah, it's currently set to re-throw if an exception ever gets all the way back to a server, that's a very unexpected error.

I feel like this probably should get caught further up the stack closer to wherever it was grabbing the PR from because it's probably safe to recover at that point

ashfurrow commented 5 years ago

Cool, thanks! I'll look into adding some more safety there.

ashfurrow commented 5 years ago

Looks like we're getting a different crash now (but on that same CocoaPods PR! huh!):

Request failed [404]: https://api.github.com/repos/CocoaPods/Specs/pulls/14419

Response: {
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3/pulls/#get-a-single-pull-request"
}
error: UnhandledRejection Error: 
Could not get PR Metadata for repos/CocoaPods/Specs/pulls/14419

So I think this PR of mine helped Peril get further in runEverything but still failed on this line:

https://github.com/danger/danger-js/blob/33c3674c8a0f4d0fffc7a58b958ad5a3f7fcf239/source/platforms/github/GitHubAPI.ts#L204-L208

I'm not sure where this is happening, though. The function is getting called from fileContents but that's about as far as I got.

@orta in the meantime, it looks like MongoDB still has a job for that 404'ing PR. Could we remove it from the db manually?