Open alsuren opened 2 months ago
WARNING: this is very likely to result in the same crate being built many times in parallel if someone runs the cronjob manually on ci/locally
We have a mechanism to cancel duplicate builds, though it does waste machine time as the previous in progress job is just cancelled.
A better way would to check, if that job already dispatched and is in progress in CI.
ah actually I didn't think of that. Seems we have our concurrency
set correctly:
concurrency:
group: build-package-${{ github.event.inputs.crate }}-${{ github.event.inputs.version }}-${{ github.event.inputs.target_arch }}
we don't set cancel-in-progress
so it shouldn't cancel any jobs that have already started when you enqueue a new one: only ones that are still pending (not started) according to https://docs.github.com/en/enterprise-cloud@latest/actions/writing-workflows/choosing-what-your-workflow-does/control-the-concurrency-of-workflows-and-jobs#using-concurrency-in-different-scenarios
Seems like we can just keep pushing build requests for the same package and it will attempt to build it at most twice. Seems fine.
In theory we could put a "do we already have a package for this?" check at the top of the builder, but it wouldn't save us from repeat build attempts of packages that fail, so it's probably not worth the bother.
[edit: eventually the exclude logic will kick in in the cronjob, so worst case, we do n+1 or maybe n+2 builds, where n is the limit of builds we allow in the exclude logic]
would there be any value in including cargo-binstall's tree of fallback architectures here?
Honestly, I think we should just build the crate for all supported target anyway, but I can understand that you'd want to prevent hitting rate limit.
In that case, it makes sense to skip the fallback architecture.
can we make this simpler while keeping it fair and resilient to rate limit problems
We could utilise branching, to save our progress
We start by creating a branch and save all the crates to build there, and their version (cached)
When the "Build package" is done, we go to that branch and edit it, to mark that off the queue.
We already do this for failed crates, so I don't think we would add too many complexity to the code.
can we gain anything by doing things on a crate-by-crate basis (so we make the most of any crates.io index fetches we make)?
Yes definitely, if we can't do that, maybe we could cache the crates-io index fetch?
We start by creating a branch and save all the crates to build there, and their version (cached)
this is kind-of a similar architecture to what I had originally - push the next crate name onto the list and commit it to the trigger/$arch branch (which is why it is named trigger/*). Pushing the whole queue on does sound like an interesting idea though. Would allow us to get rid of the randomness I guess.
Yes definitely, if we can't do that, maybe we could cache the crates-io index fetch?
I put an lru_cache around the crates.io fetch, but that will only save us work for within a single cronjob run. I suspect that cacheing for longer than that (github actions?) would end up more expensive than the crates.io fetch, with the added problem that it would be stale after an hour.
Pushing the whole queue on does sound like an interesting idea though. Would allow us to get rid of the randomness I guess.
Yeah, and we can "save" the progress without having to do another github restful API, by saving crates built into the branch.
I suspect that cacheing for longer than that (github actions?) would end up more expensive than the crates.io fetch, with the added problem that it would be stale after an hour.
Yeah just caching it within cronjob will be fine.
What are we looking for? 1) the amount of compute time that we waste trying to build any single fucked package has an upper bound 2) we should be able to respond within a small number of hours to a new version of a popular crate (that people are requesting from us) 3) the amount of compute time that we spend on checking and building packages should otherwise be shared equally among the top N popular crates on crates.io.
(Honestly, I would love it if 2) was actually on-demand while you wait, like how https://gobinaries.com works, but rust compile times won't make that super-easy)
I reckon the shuffled list of shuffled lists thing will be reasonably easy to implement, and doesn't require any re-architecting, so I will probably have a stab at that once we have clients deployed and reporting to the new stats server (sorry: I've not been working on the binstall part of this because something came up at home).
Currently we do:
[edit: This is problematic because
Here is a slightly convoluted scheme that might allow us to fairly make progress while also prioritising checking of things that are likely to be useful to people
Questions:
1) would there be any value in including cargo-binstall's tree of fallback architectures here? 2) can we make this simpler while keeping it fair and resilient to rate limit problems 3) can we gain anything by doing things on a crate-by-crate basis (so we make the most of any crates.io index fetches we make)?