Feature request: Shard benchmarks across multiple CI jobs

overlookmotel commented 7 months ago

OXC Javascript compiler is using CodSpeed for measuring performance. For a performance-focused project like OXC, CodSpeed is absolutely amazing.

However, running the benchmarks is currently by far the project's slowest CI job.

Would it be feasible to enable sharding the benchmarks across multiple CI jobs, to shorten both the build and execution time of each, and then a final job to pull the results together and upload to CodSpeed?

adriencaccia commented 7 months ago

Hey @overlookmotel, thanks for the kind words!

To reduce the build time for everyone, we will definitely look at what @Boshen did in https://github.com/oxc-project/oxc/pull/2343. This seems like something that could easily be implemented directly in this repo, so that everyone gets lower build time.

As far as sharding the benchmarks across multiple CI jobs, this is something that we will have to look into. At the moment CodSpeed only supports a single upload per run, so we would need to add some kind of sharded uploads per run or find a way to generate reports in multiple CI workers, then in a final job aggregate the reports and make a single upload.

The former solution would require a lot of rework on our app and take a lot of time. The later might be more hackable, since the changes would only need to be done on our open-source https://github.com/CodSpeedHQ/runner and https://github.com/CodSpeedHQ/action repositories

overlookmotel commented 5 months ago

We've implemented sharding via a rather hacky method on https://github.com/oxc-project/oxc/pull/2751.

Now that can see it works, I'd like to "do it properly" and submit PRs against https://github.com/CodSpeedHQ/runner and https://github.com/CodSpeedHQ/action so everyone can benefit.

A few questions:

`.out` files

The .out files are named <pid>.out. Do the file names matter at your end?

Because each shard runs on a different machine, it's possible for 2 jobs to have same PID, and therefore to end up with 2 x 1234.pid files. I had to add some logic for spotting duplicates and renaming them:

https://github.com/oxc-project/oxc/blob/95ac2655047ab0692337cc1059b01f56b4ba76d8/tasks/benchmark/codspeed/upload.mjs#L69-L84

But maybe the filenames have no semantic meaning to your back end, so could more easily get unique filenames by prepending a shard ID on the start of each e.g. shard1-1234.out, shard2-1234.out?

Log files

Are the .log files which are included in the profile archive required / useful to you? If they are, they'd need to be merged somehow, or included as runner_shard1.log, runner_shard2.log etc. Would your back end accept that?

Metadata

Presumably the metadata JSON which is sent to CodSpeed's API can be generated on any job. For Github Actions at least, it looks like it only contains info which relates to the PR/branch, not the GH Actions job. Is that correct?

API

Implementation I had in mind:

Runner:

Add profile-directory CLI switch to specify where the profile data gets written to. The action can read the .out files from this directory.
Add upload-only CLI switch to skip running anything and just upload profile data from profile-directory.

Action:

Add param shard which receives a unique ID for a shard. If provided, the action runs codspeed-runner --profile-directory <whatever> --no-upload and then saves content of the profile directory as an artefact.
Add param shard-finish which downloads artefacts, combines them, and runs codspeed-runner --profile-directory <whatever> --upload-only

Does this sound OK to you?

adriencaccia commented 5 months ago

Hey @overlookmotel, sorry for the delay in my response.

We talked about it internally and we want to directly support multiple uploads in our backend. This will mask a lot of the sharding logic in the action and the runner, and allow for easier integration with other providers.

Thanks a lot for the work you already did on this 🙏 and I am sorry that we do not continue down the single upload way. We are using your work as inspiration for the new specs.

I will update you when we have them, and if you are still willing, it will be a pleasure to have you contribute to the runner/action!

overlookmotel commented 5 months ago

OK cool. That does sound like an ideal solution.

If I can add a couple of suggestions:

1. Results on completion

Some benchmarks (shards) will may take much less time to run than others (that's certainly the case with Oxc). It'd be ideal if results from completed shards were visible on CodSpeed as soon as they're uploaded, and the rest show as "Running..." - as opposed to have to wait for all the shards to finish running before you can see the results for any of them.

2. Handle missing benchmarks

Oxc has multiple crates. Currently the benchmarks for crate X run even if the only changes in the PR are to crate Y (and X doesn't depend on Y). This is wasteful of CI resources, and I'd like to run only the benchmarks for the crates which could be affected by the code changes in the PR.

However, if you just prevent some benchmarks from running, then they show on Codspeed as "Missing" and it makes everything look red and bad!

To work around this, I have hacked together something to cache benchmarks results from a previous run and re-upload them (without actually running the benchmark again):

https://github.com/oxc-project/oxc/blob/586e823fd6c2c1bb2bdd1164447def0d7cfccfde/tasks/benchmark/codspeed/upload.mjs#L40-L55

This isn't the right way to do it of course, but it'd be nice if CodSpeed could somehow handle this scenario where not all benchmarks run, but that is intentional rather than a problem.

adriencaccia commented 5 months ago

1. Results on completion

Some benchmarks (shards) will may take much less time to run than others (that's certainly the case with Oxc). It'd be ideal if results from completed shards were visible on CodSpeed as soon as they're uploaded, and the rest show as "Running..." - as opposed to have to wait for all the shards to finish running before you can see the results for any of them.

Definitely, we plan to do incremented report updates. This is one of the upsides of allowing multiple uploads for a single run.

2. Handle missing benchmarks

Oxc has multiple crates. Currently the benchmarks for crate X run even if the only changes in the PR are to crate Y (and X doesn't depend on Y). This is wasteful of CI resources, and I'd like to run only the benchmarks for the crates which could be affected by the code changes in the PR.

We have thought about it in the past, but we could not find a way to implement this at that moment. How can we know if the non-uploaded benchmarks are dropped or deliberately not run, in a similar use case as you outlined? Handling missing benchmarks will require us to find a way to differentiate between the two cases.

overlookmotel commented 5 months ago

How can we know if the non-uploaded benchmarks are dropped or deliberately not run, in a similar use case as you outlined?

Yes, that's true - the ambiguity is a problem. Maybe if the action allowed you to input that information. e.g. skipped: ["bench1", "bench2"]?

I guess you will have this problem in some form regardless of my request. How can CodSpeed backend know when the run is done and it's time to generate the report comment? e.g.:

Is benchX still running, or was it dropped?
Have all shards finished running, or has a new benchmark been added but it's not completed yet?

I suspect one of these will be required:

An initial call to the backend saying "expect the following results"
Or a call at the end saying "all runs now completed"

Either way, that'd be a good place to add a message "benchX is skipped, not dropped".

colin-ho commented 2 weeks ago

Hey just wanted to check if there's any updates to supporting sharded benchmarks? We just integrated CodSpeed into Daft: https://github.com/Eventual-Inc/Daft/pull/2696 and I think it'll be really useful!

adriencaccia commented 1 week ago

Thanks for the interest @colin-ho, glad to have Daft using CodSpeed!

Sharding is still on our roadmap, we will update this issue when it is relevant.

CodSpeedHQ / action