chdsbd / kodiak

🔮 A bot to automatically update and merge GitHub PRs
https://kodiakhq.com
GNU Affero General Public License v3.0
1.05k stars 63 forks source link

unable to fetch event info on newly onboarded repo #615

Open rdmulford opened 3 years ago

rdmulford commented 3 years ago

We've been using kodiak (SELF HOSTED) for a couple months now with no issues, but today when trying to onboard one of our larger repos kodiak is throwing the following errors in the logs, only for PRs in that repo.

ERROR kodiak.queries:queries.py:913 event='could not fetch event info' install='2487' owner='REDACTED' pr=3285 repo='REDACTED' res={'errors': [{'message': 'Something went wrong while executing your query. Please include `REDACTED` when reporting this issue.'}]} sentry_id=None
INFO kodiak.pull_request:pull_request.py:45 event='failed to find event' install='2487' number=3285 owner='REDACTED' repo='REDACTED'

(ive left out that ID string on purpose since im not sure what its used for and dont want it to be a security issue, not sure that it would help anyway on a self hosted instance but let me know if its required to debug this issue)

looks like this is where the error is coming from: https://github.com/chdsbd/kodiak/blob/60ba7b44c34231ecf439c34ec81f25397dc201aa/bot/kodiak/pull_request.py#L47

Is there anything known that could cause this error? We have branch protections set up same as all our other repos where kodiak is still working just fine, errors are just coming from this newly onboarded repo so im guessing something is wrong with how stuff is set up for that repo but im not sure what. I already tried re-deploying our kodiak instance just in case, no luck.

chdsbd commented 3 years ago

Looks like an internal GitHub API error.

I've encountered this before where a certain combination of GitHub branch protection settings would cause GitHub to error. GitHub fixed that bug, but I'm guessing this is similar.

I'd compare branch protection settings across repos and look for differences.

It's hard for me to investigate without access, but this is the query to use: https://github.com/chdsbd/kodiak/blob/60ba7b44c34231ecf439c34ec81f25397dc201aa/bot/kodiak/queries/__init__.py#L62

You can find your installation by running .venv/bin/kodiak list-installs in the Docker container. With your installation ID you can generate an access token (valid for 1 hour) via .venv/bin/kodiak token-for-install my-install-id which will allow you to query the GitHub API as your Kodiak bot.

I'd try to reproduce the error by querying using the GraphQL query above and then contact GitHub support.

You could also try the hosted version of Kodiak and see if it's a problem there too. That's easier for me to debug.

chdsbd commented 3 years ago

This is the previous issue I was talking about: https://github.com/chdsbd/kodiak/pull/509#issuecomment-699231094

rdmulford commented 3 years ago

@chdsbd Thanks for the quick response! I will try the debug methods you've mentioned, but im wondering if since we're on github enterprise the fix for code-owners may not be a part of our github instance so I wonder if that could be the issue. Still investigating, but if that were the case is there any known workaround?

I compared branch protection settings with our other repos and didn't notice anything different

chdsbd commented 3 years ago

Are you using CODEOWNERS with the "Require review from Code Owners" setting?

Looking at the GitHub Support ticket from September, the issue only occurred when "Require review from Code Owners" was enabled and the pull request had a file matching CODEOWNERS. GitHub fixed the issue on October 8th for github.com.

Here's the API request I included in that GitHub support ticket which reproduced the issue. ``` curl --request POST \ -v \ --url https://api.github.com/graphql \ --header 'accept: application/vnd.github.antiope-preview+json,application/vnd.github.merge-info-preview+json' \ --header 'authorization: Bearer ' \ --header 'content-type: application/json' \ --data '{"query":"query ($owner: String!, $repo: String!, $PRNumber: Int!) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $PRNumber) {\n author {\n login\n }\n reviewDecision\n commits(first: 1) {\n nodes {\n commit {\n id\n }\n }\n }\n }\n \n }\n}","variables":{"owner":"chdsbd","repo":"test_repo_local","PRNumber":584}}' Note: Unnecessary use of -X or --request, POST is already inferred. * Trying 140.82.113.5... * TCP_NODELAY set * Connected to api.github.com (140.82.113.5) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/cert.pem CApath: none * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use http/1.1 * Server certificate: * subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=*.github.com * start date: Jun 22 00:00:00 2020 GMT * expire date: Aug 17 12:00:00 2022 GMT * subjectAltName: host "api.github.com" matched cert's "*.github.com" * issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA * SSL certificate verify ok. > POST /graphql HTTP/1.1 > Host: api.github.com > User-Agent: curl/7.64.1 > accept: application/vnd.github.antiope-preview+json,application/vnd.github.merge-info-preview+json > authorization: Bearer > content-type: application/json > Content-Length: 419 > * upload completely sent off: 419 out of 419 bytes < HTTP/1.1 200 OK < Server: GitHub.com < Date: Mon, 28 Sep 2020 22:26:10 GMT < Content-Type: application/json; charset=utf-8 < Content-Length: 155 < Status: 200 OK < Cache-Control: no-cache < X-GitHub-Media-Type: github.v4; param=antiope-preview; format=json, github.merge-info-preview; format=json < X-RateLimit-Limit: 5000 < X-RateLimit-Remaining: 4998 < X-RateLimit-Reset: 1601335530 < X-RateLimit-Used: 2 < Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset < Access-Control-Allow-Origin: * < Strict-Transport-Security: max-age=31536000; includeSubdomains; preload < X-Frame-Options: deny < X-Content-Type-Options: nosniff < X-XSS-Protection: 1; mode=block < Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin < Content-Security-Policy: default-src 'none' < Vary: Accept-Encoding, Accept, X-Requested-With < X-GitHub-Request-Id: CEF8:6525:114BE3C:25AA12F:5F726302 < {"errors":[{"message":"Something went wrong while executing your query. Please include `CEF8:6525:114BE3C:25AA12F:5F726302` when reporting this issue."}]} * Connection #0 to host api.github.com left intact * Closing connection 0 ```
rdmulford commented 3 years ago

@chdsbd Yes, we're using code owner with the "Require review from Code Owners" setting. That seems to line up with the errors as ive discovered that the repo having errors is the only one using the code owner setting and it seems like its only on PRS where code owner reviews are required.

Going to try and track down the release where github fixed this issue and see if i can verify if its a part of our version of github enterprise or not. I'll also see if i can modify your curl to hit our instance and see if we get the same result. If it turns out to be a github issue is our only option to wait until our enterprise version gets updated with that fix?

chdsbd commented 3 years ago

Before #509 added full support for CodeOwners, Kodiak acted like update.always = true was set for pull requests that needed a Code Owner approval. So Kodiak would immediately update those pull requests.

If that behavior is acceptable you could revert #509 and create a new docker image to use until GitHub Enterprise is updated.

rdmulford commented 3 years ago

@chdsbd Gotcha thanks good to know thats an option. Im going to do some testing and verify the issue as well as test the fix you mentioned. Is there any chance you know the release where github released the fix for the issue? If its not on hand dont worry about it I can keep looking for it. Thanks so much for the help by the way!

chdsbd commented 3 years ago

I opened a follow up ticket with GitHub asking which GitHub Enterprise version includes the fix. I’ll let you know their response.

chdsbd commented 3 years ago

I'm a little confused, but this is the response I got from GitHub support:

The fix is not dependent on any GHES version. It bothers around the level of permission granted to "GitHub Apps" in our codebase - if you recall, I was also able to reproduce the problem then using my GitHub App.

If your GHES customer is encountering a similar problem interacting with GitHub Apps, Could you ask them to raise a support ticket, so our Enterprise Support team can have a closer look?

chdsbd commented 3 years ago

Not sure if it helps, but my GitHub Support ticket numbers were #1043503 and #847108.

rdmulford commented 3 years ago

huh that is weird. its a github app permissions issue that only effects PRs requiring codeowner reviews? Im still gathering information but when i get the chance ill raise a support ticket with them. thanks for looking i appreciate it!

chdsbd commented 3 years ago

I got another reply from GitHub:

Thanks for the follow-up! I made some enquiry with folks in the enterprise support team and you're absolutely correct -sorry for my incorrect understanding.

We rolled out a fix for the enterprise version as well. The GHES version is 3.0.

rdmulford commented 3 years ago

Got it that makes more sense. I believe work to get us in 3.0 is in progress so we'll decide if we want to just wait for that to roll out or attempt to revert that change and deploy a separate version.