adafruit / adabot

Adabot is our robot friend who helps Adafruit online
MIT License
13 stars 27 forks source link

update_cp_org_libraries.py runs veryyyy slow #263

Open evaherrada opened 2 years ago

evaherrada commented 2 years ago

Results of my initial tests of the functions that get run.

Total time per repo: 3.5s validate_actions_state: 0.459673s validate_contents: 0.433544s validate_core_driver_page: 0.003178s validate_default_branch: 0.000001s validate_in_pypi: 0.040591s validate_labels: 0.117371s validate_passes_linting: 0.639910s validate_readthedocs: 1.361857s validate_release_state: 0.333909s validate_repo_state: 0.134701s

So the real big one is validate_readthedocs. That one is definitely one of the more complicated ones but I think we could for sure simplify it.

validate_passes_linting also takes a while but it actually clones the repository so that make sense.

dhalbert commented 1 year ago

We are now often seeing the job exceed the 6 hour job runtime limit. For example: https://github.com/adafruit/circuitpython-org/actions/runs/6809744073/job/18516615845.

There is some GitHub API rate limiting going on too (see in that job), but it would be really nice to make this run faster.

dhalbert commented 11 months ago

Thinking about your local test in #361, are we doing the queries without credentials? If we did them with credentials, would we avoid the rate limit?

jepler commented 11 months ago

I'm not sure about authorization. I filed #348 thinking that this process was using Classic PATs (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic) which are deprecated or discouraged; I forget the exact context that led to me filing that issue but it seems to be related to this note in the docs:

Note: Organization owners can restrict the access of personal access token (classic) to their organization. If you try to use a personal access token (classic) to access resources in an organization that has disabled personal access token (classic) access, your request will fail with a 403 response. Instead, you must use a GitHub App, OAuth app, or fine-grained personal access token.

Using PATs of some kind seems to be the Proper Way (TM) to do this, but "fine-grained" PATs are the preferred way now, if we don't want to deal with having a GitHub App: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#granting-additional-permissions

I am pretty sure SOME kind of authentication/token is at play here, otherwise the unauthenticated limit is something like 20 API calls an hour, far too few.

jepler commented 11 months ago

My local test was without authorization, but that was kinda-nice since it let me easily hit the timeout case. For the run in circuitpython-org that's troubling us, we appear to be using a token via ADABOT_GITHUB_USER and _ACCESS_TOKEN:

Thu, 09 Nov 2023 09:20:13 GMT     ADABOT_GITHUB_USER: ***
Thu, 09 Nov 2023 09:20:13 GMT     ADABOT_GITHUB_ACCESS_TOKEN: ***

since the value of repository secrets can't be inspected, I don't know what user & token is in play.

dhalbert commented 11 months ago

I think these tokens are OK, because I changed them recently, carefully, and other Actions jobs depend on them.

jepler commented 11 months ago

A successful run looks like this:

Sat, 25 Nov 2023 09:19:00 GMT Run Date: 25 November 2023, 09:19AM
Sat, 25 Nov 2023 09:19:00 GMT  - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Sat, 25 Nov 2023 10:02:45 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:02:45 GMT Rate Limit will reset at: 2023-11-25 10:19:13
Sat, 25 Nov 2023 10:19:13 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:20:59 GMT {
Sat, 25 Nov 2023 10:20:59 GMT   "updated_at": "2023-11-25T09:19:00Z",
[rest of json snipped]

so we have:

A failed run:

Fri, 24 Nov 2023 09:20:27 GMT  - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Fri, 24 Nov 2023 10:06:30 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 10:20:39 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 15:20:09 GMT Error: The operation was canceled.

here,

Besides adding debugging another idea is to prepend the command with timeout so that we can hopefully get a Python traceback from where the process is stuck, something like timeout -s INT 18000 [adabot command]. this would be a change in the circuitpython-org repo, not here, I think.

jepler commented 11 months ago

also we could sleep +60 seconds after we think the rate limit will reset, not +1 second; since we hit the rate limit after 40-45 minutes and it resets after 1 hour, this doesn't really lower our throughput much.

dhalbert commented 11 months ago

Can we insert delays between our requests? Would that throttle it enough to avoid the rate limit?

jepler commented 11 months ago

I think I found the real problem :tada:

dhalbert commented 11 months ago

Fixed by #362.

dhalbert commented 11 months ago

I saw another failed 6-hour run, despite circuitpython-org being updated to include #362. :confused: https://github.com/adafruit/circuitpython-org/actions/runs/7030445965 Reopening @jepler

jepler commented 11 months ago

oh drat

tekktrik commented 10 months ago

I have thought about moving part of the CI into libraries themselves somehow. The basic idea is to have repos trigger when updated (or scheduled) to push a record of the information gathered either to a central repo (like a folder in this repository, for example) or possibly an S3 bucket. This has the advantage of spreading out or eliminating API calls, as well as distributing the burden over multiple CI runs across the libraries. This would allow this CI to just collect them all.