NixOS / infra

NixOS configurations for nixos.org and its servers
MIT License
209 stars 92 forks source link

rhea: scale less #334

Closed vcunat closed 5 months ago

vcunat commented 5 months ago

It's been commonly hapenning that we spin up many machines but can't keep them occupied due to bottlenecks in the central machine (probably it's mostly the compression of copying-results step)

So let's scale less aggressively and thus waste less.

vcunat commented 5 months ago

I forgot to say that this has been deployed for about 30h now, but let me open it here for discussion (and further tweaks perhaps).

vcunat commented 5 months ago

Ideally we'd have some mechanism that prevents further scaling if rhea can't keep up feeding the machines with work. Currently the scaling is based just on the count of jobs ready to be built, but there clearly are some other bottlenecks sometimes.

My understanding is that we'll be using the scaling much less by May anyway, so the impact of this issue should become lower than now.

vcunat commented 5 months ago

The changes are intentionally harsher for aarch64 than for x86_64, as that seems desirable.

vcunat commented 5 months ago

It's true that we shouldn't keep this open/unresolved for too long, as deploying something else to rhea would undo the changes.