bitwiseworks / qtwebengine-chromium-os2

Port of Chromium and related tools to OS/2
9 stars 2 forks source link

ninja: limit default parallelism #10

Closed dmik closed 4 years ago

dmik commented 4 years ago

By default, ninja creates a number of parallel build processes corresponding to the the number of CPUs + 2. I.e. for a typical 4 core system this gives 6 parallel processes.

However, the 32 bit nature of OS/2 and the fact that modern GCC is developed with 64 bits in mind, may lead to a hard OOM (out of memory) condition when too many GCC instances compile too much C++ at the same time. It's not rare when one such an instance allocates as much as 1-2 GB of memory. Given that the maximum amount of available memory for all processes is about 3.3 GB on OS/2, it's not a surprise that 4 and more GCC instances eat up all the available physical memory pages and the OS/2 kernel goes boom (a kernel panic that leads to either a hard hang or a reboot depending on your luck).

One real case is described here: https://github.com/bitwiseworks/qtwebengine-chromium-os2/issues/3#issuecomment-638823891

There is no easy solution to this problem other than provide custom memory management for GCC (and other greedy applications) that would swap to disk or to system memory above 4G (which is not directly available to applications due to 32 bitness and its limits). But these approaches are quite complex and far beyond the subject of this project.

So for now I will just limit ninja to never use more than 3 parallel processes on OS/2. It will still be possible to go higher with -j N command line option (where N is the number of parallel processes).

StevenLevine commented 4 years ago

I wonder if a simpler solution might suffice? I wonder if there is enough useful info in DosQuerySysInfo to effectively dynamically limit the number of parallel processes that ninja will allow?

dmik commented 4 years ago

I thought about something like that too but since it's unknown upfront how much memory the launched process is about to allocate (it's even not necessarily a GCC process), I see no viable approach here.

StevenLevine commented 4 years ago

You really don't need to know upfront. You just need to use conservative estimates. Let's say your worst case process is going to require 1GB of memory, you don't allow ninga to start a new process unless you have 1GB available.

I use a variant of this technique when running multiple parallel git and svn instances. If there's not sufficient resources available, I hold off starting a new git instance until some of the currently running processes complete and free up resources. When calculating, I always assume that the pending git instance is going to require worst case resources (i.e the repo to be processed is the largest repo in the set).

This is similar to your situation with ninja, you don't know what the resource requirements of the next process will be, so it's safe to assume the worst case.

In my case, this is not quite true. I know the next reposistory in the worklist, so I could estimate more accurately and allow more parallel gits by looking at the repository size. However, what I have gets the job done fast enough and I don't remember the last time that I hung the system by having too many parallel gits running.

dmik commented 4 years ago

Yes, I get the idea. In ninja there is already a switch (-l IIRC) that specifies the maximum CPU load where ninja should stop creating new jobs. I could do so for memory too but I it will change nothing compared to what we have now. If we assume 1GB per process, it will allow to start no more than 3 processes anyway. Actually this is why I decided to make 3 processes a default maximum on OS/2.

More over, my tests show that -j4 doesn't give much in terms of speed compared to -j3 (needless to say about -j6 which worked more or less as as -j4 because we only have 4 CPUs). -j3 even has some benefit as it leaves the OS some air to breath. With -j4 and above the system is barely responsive even locally, needless to say about a VNC session.