dfki-ric / ugv_nav4d

A 4D (X,Y,Z, Theta) Planner for unmaned ground vehicles (UGVs).
https://dfki-ric.github.io/ugv_nav4d/
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Automatic build fails if QT 5.15.3 is installed #9

Closed OlgerSiebinga closed 1 month ago

OlgerSiebinga commented 1 month ago

I tried to run the automatic build on Ubuntu 22.04.4 with QT 5.15.3 installed. The automatic build script caused my OS to freeze in a (seemingly) endless loop. After some digging in the trace I found that this specific version of QT was the most likely cause: -- Found unsuitable Qt version "5.15.3" from /usr/bin/qmake while the script was configuring gui-vizkit3d. Manually removing QT5 and rerunning the build script fixed the issue.

Full output until my OS froze: output.txt

(part of https://github.com/openjournals/joss-reviews/issues/6983)

haider8645 commented 1 month ago

@planthaber @pierrewillenbrockdfki any idea why this could happen? I also get the same error on rebuild that -- Found unsuitable Qt version "5.15.3" from /bin/qmake

pierrewillenbrockdfki commented 1 month ago

Found unsuitable Qt version "5.15.3" from /bin/qmake

This is from cmakes FindQt4 macro package, which then correctly reports that qt4 was not found. This could be suppressed by adding QUIET to the FindPackage call.

The original problem with the OS apparently freezing usually is caused by a compiler using more virtual memory than the system can keep present in physical memory, causing the system to swap out lesser used virtual memory, like the users graphical session. @haider8645 does this use a parallel build(-j)? Not building parallel can relieve memory pressure simply by not having many compilers trying the use memory.

In the olden days of spinning rust, you'd literally hear the disk thrashing to move the memory pages in and out of swap, but nowadays you only get a lit access light, if anything. You'd then be either waiting and hoping that some work will finish to free up memory, or reach for SysRq+R,E,I,S,U,B.

Also want to mention that this rarely points to a hardware issue where the system overheats or has some bad memory cells. In that case, it would probably not react to SysRq anymore, but there are also reasons why SysRq would not work even though the Linux kernel is still alive(For example, SysRq being disabled).

OlgerSiebinga commented 1 month ago

Do I understand you correctly when I say that with this virtual memory issue, the build would continue in the background? Because in that case, I doubted whether this was the problem.

My OS was frozen the first time for something like 20 minutes. It was stuck at the same step it reached in the output.txt file. However, I obtained this file by redirecting the output of the terminal to a file, and let my machine run for over 2 hours (while it was frozen) in the hope that it was doing something in the background. But it did not get any further this time. I would expect that in the second situation (with the output going straight to a file), it would not be affected by the system taking memory away from the graphical session.

So I'm not completely sure, but I suspect something else is happening.

pierrewillenbrockdfki commented 1 month ago

Your output.txt looks normal to me; the compiler is warning about a few different functions that use a deprecated QFlags constructor, and then output just stops. I can already tell that multiple compilers are in fact running at the same time, since the warning messages do not follow the begin of compilation(Building CXX object ...), but instead appear later. I think there are about 8 compilers running in parallel. At around 4GB memory use per compiler, that would mean 32GB physical memory required. Usually, compilers use way less, though.

Since userspace cannot ever truly crash or freeze a system(that would be a kernel or hardware bug), something else must be going on where the system appears frozen but is not(meaning some amount of work is still getting done).

The situation with too little physical memory is one of those situations where the system might appear frozen(The OutOfMemory killer only acts when the swap space is full, too). Another one could be when thousands of processes get created in a short amount of time(see forkbomb).

A single process that does not use a huge amount of memory and does not spawn a huge amount of threads cannot freeze a system, just create a lot of system load, but generally leaves enough time for interactive processes to do their work(processes that have been idle preferentially get time slices assigned when they become runnable).

C++ compilers are known to allocate and use huge amounts of memory(multiple GBs). I have seen systems appear frozen due to that, only that i could see it coming when GUI started to stutter and then freeze, with a system health tool showing a sharp increase in swap traffic, and hear the harddisk thrashing during that time(A few years back, when harddisks were still common).

At that point, the compilation slows by a few orders of magnitude. One can try to wait it out, but that could take days. In that situation, very little real work is done, the kernel is busy swaping pages in and out, while processes wait for their memory to be swapped back in, only to be swapped out when the next process demands its memory.

Since you are using ubuntu, you can try to limit the total amount of virtual memory available to the compilation, but i'd expect the compiler processes to be killed before they are done for using too much memory.

It looks like you can limit the amount of memory available to some program/process group by running it with systemd-run like this:

systemd-run --scope -p MemoryMax=100K --user bash

The above should not start a shell but instead come up with an error "killed"; if it does instead create a new shell, try without --user, but that requires temporarily elevating privileges(systemd-run on some systems silently fails to set the memory limit). For testing this on a compilation attempt, i guess MemoryMax=10G or something in that range would be required. But again, i would expect compilation to abort with this.

I think the real fix is to stop compiling in parallel, which @haider8645 probably knows better how to accomplish.

planthaber commented 1 month ago

Or remove the "-j" here ans try again: https://github.com/dfki-ric/ugv_nav4d/blob/main/source_dependencies/build.bash#L25

OlgerSiebinga commented 1 month ago

Thanks for the suggestions. To clarify, manually removing Qt5 from my system already solved the issue. Without Qt 5.15.3 installed, the build runs without any problems.

haider8645 commented 1 month ago

The automatic build works without the parallel build as suggested by @pierrewillenbrockdfki. Thanks!