build-trust / ockam

Orchestrate end-to-end encryption, cryptographic identities, mutual authentication, and authorization policies between distributed applications – at massive scale.
https://ockam.io
Apache License 2.0
4.43k stars 558 forks source link

CI: Partition rust test jobs with nextest #3619

Open mrinalwadhwa opened 1 year ago

mrinalwadhwa commented 1 year ago

We run our tests using nextest https://github.com/build-trust/ockam/blob/develop/.github/workflows/rust.yml#L125 https://github.com/build-trust/ockam/blob/develop/.github/workflows/rust.yml#L215

These test take 7 to 10 minutes to run and this time keeps increasing as we add more tests.

nextest supports partitioning tests to run tests in parallel https://nexte.st/book/partitioning.html

I started working on this in #2954 however didn't get time to tune it. https://github.com/build-trust/ockam/pull/2954/files

We need to find the right balance of parallelism to get a good reduction in time. If we over do the parallelism then Github starts queuing actions in serial so the parallelism is counter productive.


We love helping new contributors! ❤️ If you have questions or need help as you explore, please join us on Discord. If you're looking for other issues to contribute to, please checkout our good first issues.

memark commented 1 year ago

I'll take a look at this, based off of that existing branch.

mrinalwadhwa commented 1 year ago

That would be great. Thank you. Let me know if you have questions

memark commented 1 year ago

Just updating that I'm still working on this.

mrinalwadhwa commented 1 year ago

@memark Thank you for continuing to spend time on this.

Genysys commented 1 year ago

@memark are you still working on this? If not I would like to pick it up

memark commented 1 year ago

@Genysys You're welcome to take over. Here is my branch (copy it into your own) https://github.com/memark/ockam/tree/partition-ci-tests

The hard part is measuring what number of partitions is the fastest. The GitHub runners have very varying load, and gives different results each time. I also tried running locally with act, but that was hard to get working.

One difference from the initial branch is that I've used count instead of hash. That gives a more predictable and consistent division between the partitions.

kaustubhbabar5 commented 1 year ago

Hey @mrinalwadhwa, since there is no recent activity here, I decided to work on this. Hope thats fine

After partitioning the tests and running them on ci I realised that we're compiling the code seperately for each of these partitions and this is the most time consuming which takes around 5m To optimize this I'm looking into nextest Archiving and reusing builds, this will allow us to build the code once and reuse the build in each of partition.

I have not yet looked into the optimal number of partitions. will update here soon.

memark commented 1 year ago

@kaustubhbabar5 Yes, it probably spends more resources than necessary to do compilation in every partition. But are you sure that the total time actually gets longer? A single long compilation done once might just take as long as a few equally long compilations done in parallel. (Which could possibly add CI complexity without a large gain.) Just a thought.