Open klaernie opened 11 months ago
Yes, that would be a really nice step!
At the moment the process is very ineffective. And you are addressing the two fundamental reasons. I'll express them here in my own words:
I would actually like to tackle the second point first, but it would be nice to have the automation now... I'll take a closer look at this in the next few days. Of course, I'm always happy to receive help :slightly_smiling_face:
Technically speaking, if you're writing a .json for each an every plugin you found into a separate branch, no vandalism on the wiki could affect that data anymore. That would also allow to generate the other json-files from during a build of the page, and make a 10k json file readable (splitting it into a directory).
I'd probably also store the last commit ID in the json, since that allows to check for changes in a single API call, even without cloning the repo, and can short-circuit regenerating the remaining data.
If you're exhausting the rate limits of the github API, one way would be to segment the modules into buckets, where each bucket is checked in a specific hour of the day. Also one could make the refreshing process pay attention to the actual rate-limit usage according to the docs and simply pause until there are call available again.
I'd happy to help, but atm I'm refreshing my entire homelab (newer hardware, migrating multiple terabytes to zfs, getting rid of stupid dependencies like running NFS on a physical server...) so I'm pretty much underwater for another few weeks..
But of cause I'm always here to be a rubber ducky with experience in scaling systems (doing that at work for a nagios instance)
I'd have a pretty hacky idea to run against the github api: store that data in a branch of this repo, then have a github actions job run over the data, and since github provides you a GITHUB_TOKEN during the job run you can make 1000 calls per hour, from what I first found. Then just let the job do the periodic work of refreshing information, and when an update to the branch happened, because the job found new information to update the other jobs can take care of publishing the update, all without requiring infrastructure on your end.
Just a stupid and very rough idea, of cause.