using the github apis - Githubissues

klaernie commented 11 months ago

I'd have a pretty hacky idea to run against the github api: store that data in a branch of this repo, then have a github actions job run over the data, and since github provides you a GITHUB_TOKEN during the job run you can make 1000 calls per hour, from what I first found. Then just let the job do the periodic work of refreshing information, and when an update to the branch happened, because the job found new information to update the other jobs can take care of publishing the update, all without requiring infrastructure on your end.

Just a stupid and very rough idea, of cause.

KristjanESPERANTO commented 11 months ago

Yes, that would be a really nice step!

At the moment the process is very ineffective. And you are addressing the two fundamental reasons. I'll express them here in my own words:

I'm still doing it manually. => There's already a PR to change that: #2 => It would be nice if we had some rudimentary protection against vandalism. If someone intentionally or unintentionally empties the module list in the wiki, I would notice this in the current manual process.
Always all repositories are cloned and tested although it only makes sense for updated and new ones. => Your suggestion would solve most of this. Only the repositories with new commits would have to be covered. => In addition, we need a job to monitor the official module list to trigger removing or adding modules.

I would actually like to tackle the second point first, but it would be nice to have the automation now... I'll take a closer look at this in the next few days. Of course, I'm always happy to receive help :slightly_smiling_face:

klaernie commented 11 months ago

Technically speaking, if you're writing a .json for each an every plugin you found into a separate branch, no vandalism on the wiki could affect that data anymore. That would also allow to generate the other json-files from during a build of the page, and make a 10k json file readable (splitting it into a directory).
I'd probably also store the last commit ID in the json, since that allows to check for changes in a single API call, even without cloning the repo, and can short-circuit regenerating the remaining data.

If you're exhausting the rate limits of the github API, one way would be to segment the modules into buckets, where each bucket is checked in a specific hour of the day. Also one could make the refreshing process pay attention to the actual rate-limit usage according to the docs and simply pause until there are call available again.

I'd happy to help, but atm I'm refreshing my entire homelab (newer hardware, migrating multiple terabytes to zfs, getting rid of stupid dependencies like running NFS on a physical server...) so I'm pretty much underwater for another few weeks..

klaernie commented 11 months ago

But of cause I'm always here to be a rubber ducky with experience in scaling systems (doing that at work for a nagios instance)

KristjanESPERANTO / MagicMirror-3rd-Party-Modules

using the github apis #4