TheBevyFlock / bevy_new_2d

This template is a great way to get started on a new 2D Bevy game!
https://the-bevy-flock.itch.io/bevy-new-2d
MIT License
165 stars 30 forks source link

The CI uses up a lot of resources #212

Open TheKnarf opened 2 months ago

TheKnarf commented 2 months ago

Screenshot 2024-08-01 at 14 41 38

A single game jam was all it took.

Too late I realized that I should disable the windows/linux/mac build since I only really needed the web build for the game jam. I think the template probably should disable those by default and require people to enable them if they need them.

Maybe there are other things one could do to reduce the amount of minutes it takes up? Maybe running cargo fmt and cargo test isn't necessary by default on main. Maybe one could run cargo test without triggering it parsing the docs folder? Are there other improvements one could make?

TheKnarf commented 2 months ago

Screenshot 2024-08-01 at 14 47 00

CI builds took almost 30 minutes to run.

Screenshot 2024-08-01 at 14 47 39

Release build took over 30 minutes when building for all targets, less than 15 minutes after disabling everything except for the web build.

janhohenheim commented 2 months ago

We intentionally made it do as much stuff as possible, assuming that most users would open source their code by default (which means infinite GitHub actions) or be ready to deal either the consequences of having a private repository, i.e. fiddling with the workflow.

That said, it would be fairly easy to enable or disable the builds based on a cargo generate flag. How would you feel about that?

The CI builds should take like a minute at most when cached, and an uncached release takes about 10 minutes for me. Did something go wrong there? Do private repos get less powerful CI agents? @TimJentzsch cargo cache works on private repos, right?

TimJentzsch commented 2 months ago

Under the hood it uses the GitHub Actions cache, I'm not aware of any restrictions for private repositories.

benfrankel commented 2 months ago

Yeah I'm curious if caching wasn't working, or otherwise why it was taking so long. Looking at my own game's CI, I see this error when trying to save the cache (https://github.com/benfrankel/blobo_party/actions/runs/10139032444):

zstd: error 70 : Write error : cannot write block : No space left on device 
/usr/bin/tar: cache.tzst: Cannot write: Broken pipe
/usr/bin/tar: Child returned status 70
/usr/bin/tar: Error is not recoverable: exiting now
Warning: Failed to save: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2

This started happening on the same CI run that spontaneously started recompiling from log, making CI take 5x longer (~15 minutes). Which is very strange because the cargo --version / Cargo.toml / Cargo.lock weren't touched as part of that commit. Maybe related:

2024-08-01_1722530283_1284x74

This is with 5 saved caches at 3 GB each.

On the other hand, my releases were all under 10 minutes because I reused the same tag every time and hit the cache.

mockersf commented 2 months ago

GitHub runners for private repos are smaller than for public repos, except macOS hosts private: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for--private-repositories public: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

janhohenheim commented 2 months ago

@mockersf thanks! That explains the performance issues then. Nothing we can fix, but it might be good to point it out in the docs. That leaves the spontaneously broken cache, which sounds like a bug in either GitHub's cache or cargo cache. @TimJentzsch, what do you think? Is there something we can do about this?

Edit: oh, looking at the logs, the machine simply ran out of space! Yeah, I've had that happen to me as well a couple of times. Don't think there's much we can do there other than restarting a build. Although you're right, this shouldn't invalidate the cache. Hmm.

TimJentzsch commented 2 months ago

Hmm thats interesting. GitHub should automatically delete old caches when the cache limit is being exceeded, but maybe it has a delay.

Maybe we can start optimizing the size of the cache though.

benfrankel commented 2 months ago

So I noticed that in the CI run I linked, the Docs and Tests jobs actually restored their cache from the Release workflow that ran right before as a fallback, then Docs was able to save its cache but Tests was not, so every subsequent run of Tests used the Docs cache instead. It seems like cache thrashing / out of cache space in my particular case.

TimJentzsch commented 2 months ago

So I noticed that in the CI run I linked, the Docs and Tests jobs actually restored their cache from the Release workflow that ran right before as a fallback, then Docs was able to save its cache but Tests was not, so every subsequent run of Tests used the Docs cache instead. It seems like cache thrashing / out of cache space in my particular case.

Ah shoot. We should probably remove the last cache key fallback. It doesnt make sense for Check to fallback to a cache from Test for example, because they need different builds anyway, it just increases the cache size for nothing.

benfrankel commented 2 months ago

Well I think the fallback itself is okay and shouldn't increase cache size, the problem is that Tests was unable to save its own cache afterwards for some reason.

Unless using a fallback means that when the cache is saved at the end of the job, it will also include any irrelevant stuff it downloaded from the fallback, thus increasing cache sizes every time a fallback occurs? If that's the case... maybe it would be better to have no fallback at all.

TimJentzsch commented 2 months ago

Well I think the fallback itself is okay and shouldn't increase cache size, the problem is that Tests was unable to save its own cache afterwards for some reason.

Unless using a fallback means that when the cache is saved at the end of the job, it will also include any irrelevant stuff it downloaded from the fallback, thus increasing cache sizes every time a fallback occurs? If that's the case... maybe it would be better to have no fallback at all.

Yes, the stuff from the fallbacks will also be saved again because its in the same folder. But without fallbacks I think we will have a cache miss too often... If we use the fallback from the same job it should be fine though. Unrelated jobs can double the cache size in the worst case

benfrankel commented 2 months ago

Could there be a situation where cache Z is a fallback from cache Y, which is a fallback from cache X, ... causing an ever-ballooning cache size over time? I wonder if there's a way to prune unused stuff from target/, but I would be surprised.

TimJentzsch commented 2 months ago

Yes, I think in theory that's possible. We should adjust the fallback in cargo-cache a bit.

Otherwise I guess it's a good idea to clear your caches in CI occasionally.

janhohenheim commented 2 months ago

Is it alright if I move this issue to cargo cache?

benfrankel commented 2 months ago

IMO that makes sense for cache size specifically, but this issue should remain here as well since we ought to consider the fact that private runners have worse performance + are not free.

janhohenheim commented 2 months ago

Would a solution be to ask you in cargo generate which builds you want?

benfrankel commented 2 months ago

Yes, and a note in the workflows doc that mentions the issue with private runners, so users hopefully see that when they're setting up CI.

TimJentzsch commented 2 months ago

I created an issue for this on the cargo-cache side: https://github.com/Leafwing-Studios/cargo-cache/issues/22

benfrankel commented 1 month ago

Here's another issue on the cargo-cache side for the root cause / optimizing the cache size down even further: https://github.com/Leafwing-Studios/cargo-cache/issues/24

benfrankel commented 1 month ago

Remaining tasks: