archlinuxcn / repo

Arch Linux CN Repository
1.48k stars 282 forks source link

solve the resource-exhaustion issue with ninja #1598

Closed lilydjwg closed 4 years ago

lilydjwg commented 4 years ago

问题类型 / Type of issues


Update: there is a new way to this issue at https://github.com/archlinuxcn/repo/issues/1598#issuecomment-626573927 and I'm going to implement it.


1597 doesn't work. kicad-git timed out because all CPU resources was occupied by ninja and our make is too kind! And possibly other timed out packages too. We need a hack!

I'll implement the first two later.

See also:

lilydjwg commented 4 years ago

We may want a pre-commit check for packages that neither mention ninja -l nor install our wrapper package. For manual packaging, the maintainer need to install it themselves. (Is there any better solution?)

lilydjwg commented 4 years ago

Or maybe we can create a package replacing ninja in a local repo? Will pacman install it instead?

Sorry if I made a lot of noise. I am as tired as our build machine today....

yan12125 commented 4 years ago

How about using systemd to limit CPU usage of processes under user.slice?

Ref: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html

lilydjwg commented 4 years ago

How about using systemd to limit CPU usage of processes under user.slice?

This seems to be good, except that it doesn't work well with make -l. There will be a lot of running processes when multiple users are compiling. What bad will this situation bring to us?

yan12125 commented 4 years ago

I guess most make flags uses $(nproc)? If so, systemd's CPUAffinity= should help. I've just tried it - CPUAffinity= changes the value returned by nproc. Commands in extra-x86_64-build are also affected.

lilydjwg commented 4 years ago

I mainly concern that under high load (say 2-3x of # of cpu cores) will other services be affected? i.e. sshd, nginx, collectd & grafana, etc. I'm going to do some tests.

lilydjwg commented 4 years ago

I'm finally done with the cgroups test. Here's the steps to take:

  1. remove any -l arguments we've put into environmental variables and PKGBUILDs
  2. set kernel parameter systemd.unified_cgroup_hierarchy=1
  3. set CPUWeight=100 in /etc/systemd/system/user-.slice.d/resources.conf
  4. reboot

This will distribute CPU time evenly to all user-XXX.slice. It will also distribute CPU time evenly to user.slice and system.slice so system services should have enough time to do their work.

(cgroups v1 should also work, but it's a mess which I didn't read about.)

Also, observations show that, with make -l and ninja -l the CPU is under-utilized, and distributed coarsely (sometimes there are only 4 makes but a lot of ninjas; however the build time halved as expected).

I'll wait some more days before moving this forward.

lilydjwg commented 4 years ago

The cgroups way has been implemented.