Closed ashfurrow closed 5 years ago
@zephraph looked and found some Zeit logs that seem to indicate that the Peril server itself was restarted:
info: ## pull_request.synchronize on CocoaPods on CocoaPods/Specs
info: 1 run needed: cocoapods/peril-settings@org/pr.ts
Request failed [404]: https://api.github.com/repos/CocoaPods/Specs/pulls/14419
Response: {
"message": "Not Found",
"documentation_url": "https://developer.github.com/v3/pulls/#get-a-single-pull-request"
}
error: UnhandledRejection Error:
Could not find pull request information,
if you are using a private repo then perhaps
Danger does not have permission to access that repo.
info: ☢️ Starting up Peril
(That PR does not exist.) So could it be that Peril having encountered an exception, requires a reboot? And that maybe tasks scheduled during that time are getting dropped? Justin recommended adding a handler for promise rejections – if that sounds like a good idea, let me know and I'll open a PR 👍
Yeah, it's currently set to re-throw if an exception ever gets all the way back to a server, that's a very unexpected error.
I feel like this probably should get caught further up the stack closer to wherever it was grabbing the PR from because it's probably safe to recover at that point
Cool, thanks! I'll look into adding some more safety there.
Looks like we're getting a different crash now (but on that same CocoaPods PR! huh!):
Request failed [404]: https://api.github.com/repos/CocoaPods/Specs/pulls/14419
Response: {
"message": "Not Found",
"documentation_url": "https://developer.github.com/v3/pulls/#get-a-single-pull-request"
}
error: UnhandledRejection Error:
Could not get PR Metadata for repos/CocoaPods/Specs/pulls/14419
So I think this PR of mine helped Peril get further in runEverything
but still failed on this line:
I'm not sure where this is happening, though. The function is getting called from fileContents
but that's about as far as I got.
@orta in the meantime, it looks like MongoDB still has a job for that 404'ing PR. Could we remove it from the db manually?
A set of tasks was scheduled to run on Artsy's Peril install this morning but didn't. A search through Cloudwatch for
"monday-informationals"
yielded only our manual invocation (and not the 9am EST one). No exception info in our Slack invoicing webhooks channel. So it seems like maybe the Lambda invocation failed, but wasn't reported on? Very happy to look into this if you point me in the right general direction.