bepaald / signalbackup-tools

Tool to work with Signal Backup files.
GNU General Public License v3.0
755 stars 36 forks source link

cmake building with unlimited number of processes break home users #177

Closed sdettmer closed 7 months ago

sdettmer commented 8 months ago

The readme says:

cmake -B build && cmake --build build -j

This creates more than 100 processes and my little home server with 4 CPUs failed to compile within 60 minutes:

gmake[1]: *** [CMakeFiles/Makefile2:95: CMakeFiles/signalbackup-tools.dir/all] Interrupt
gmake: *** [Makefile:103: all] Interrupt

real    62m26.612s
user    2m10.078s
sys     0m12.362s

As I won't upload my signal data outside my network, I cannot test on a productive 64 core machine, but even there -j 64 would be optimal.

sdettmer commented 8 months ago

as requested no pull request for https://github.com/sdettmer/signalbackup-tools/commits/dev/sde/cmake-nproc

commit 29567e8c8a0ab7511660c7b1c132358a722f2dec (HEAD -> dev/sde/cmake-nproc, sde/dev/sde/cmake-nproc)
Author: Steffen Dettmer <steffen.dettmer@gmail.com>
Date:   Wed Jan 10 01:55:48 2024 +0100

    fix README for #177: cmake building with unlimited number of processes break home users

diff --git a/README.md b/README.md
index 183732f..14367a7 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ To compile this project, current stable released versions of the following are n

 To compile the program, just running `g++ -std=c++20 */*.cc *.cc -lcrypto -lsqlite3` should generally do the trick. Add any compiler flags you feel useful, I personally use at least `-O3 -Wall -Wextra`. When compiling with an old compiler version (gcc 8.x or clang <= 7), also add the `-lstdc++fs` flag and replace `-std=c++20` with `-std=c++17`.

-If you have `cmake` available on your system, running `cmake -B build && cmake --build build -j` inside the project directory should also produce a working binary (found in the directory 'build') and make use of multiple processors if available.
+If you have `cmake` available on your system, running `cmake -B build && cmake --build build -j $(nproc)` inside the project directory should also produce a working binary (found in the directory 'build') and make use of multiple processors if available.

 For people not comfortable compiling source code, a script is provided that should compile the binary on Arch and Fedora (and probably many other distributions). Assuming the needed [requirements](#requirements) are installed, a simple `sh BUILDSCRIPT` should build the program (or, when using bash on a multiprocessor system, use `bash BUILDSCRIPT_MULTIPROC.bash44` for a faster build, and let me know if it works).
sdettmer commented 8 months ago

BTW: is there a particular reason writing BUILDSCRIPT_MULTIPROC.bash44 instead of a Makefile? It is quite complex and does not build incrementally.

I think make is just the tool for it. It by default allows overwriting variables on command line, has built-in parallel job support, deletes output files on error, tracks depencies to build only what is really needed (i.e. when its inputs have been changed) and it allows easily to use ccache!

I created an example Makefile with the functionality of BUILDSCRIPT_MULTIPROC.bash44, but with fast incremental build support, basic depency tracking and so on, see https://github.com/sdettmer/signalbackup-tools/commits/dev/sde/makefile.

For development, ccache saves so much, for example ccmake -DCMAKE_CXX_COMPILER_LAUNCHER=ccache, or with the Makefile:

apt install ccache # e.g. debian, ubuntu
make CONFIG=default CXX="ccache g++" -j $(nproc)

NB: the creating the depency graph file takes ~20 seconds and for clean pipeline (as in gitlab actions) this is not suited. The makefile could easily be changed to support this, but then this part would not be tested anymore. For development however, to avoid waiting 2 minutes for every build, incremental ccache builds of course are lovely, so let's optimize for that,

ps: The Makefile from the mentioned branch currently is https://github.com/sdettmer/signalbackup-tools/commit/ad920df9e3c4805d074eb59e1807e21f28f0be7e and the workflow extension is https://github.com/sdettmer/signalbackup-tools/commit/23532adc8a1bb904b6f5457acc2744e0ca32a43f

bepaald commented 8 months ago

This creates more than 100 processes and my little home server with 4 CPUs failed to compile within 60 minutes:

Whoops, I always thought -j without argument would use the number of processors by default. Thanks for reporting, fixed.

BTW: is there a particular reason writing BUILDSCRIPT_MULTIPROC.bash44 instead of a Makefile? It is quite complex and does not build incrementally. [...]

I have always been reluctant to learn make. The currently provided scripts and cmake conifg are auto-generated ( = low-maintenance). Even now, I do not understand all that's going on in the Makefile you linked. A few issues (mostly small):

I will consider adding the Makefile to the project, since it does seem fairly low-maintenance, but with the cmake option in place, I do not see a great need for it. So definitely do not feel obligated to spend your time improving it.

If you're worried about me not having incremental builds/dependency tracking/parallelism during development... Don't be, I use my own tooling with all of that. I do not use cmake or the provided scripts, and will not use the Makefile either.

Thanks!

sdettmer commented 8 months ago

Thanks so much for your quick reply!

This creates more than 100 processes and my little home server with 4 CPUs failed to compile within 60 minutes:

Whoops, I always thought -j without argument would use the number of processors by default. Thanks for reporting, fixed.

thank you for the great tool :)

BTW: is there a particular reason writing BUILDSCRIPT_MULTIPROC.bash44 instead of a Makefile? It is quite complex and does not build incrementally. [...]

I have always been reluctant to learn make. The currently provided scripts and cmake conifg are auto-generated ( = low-maintenance). Even now, I do not understand all that's going on in the Makefile you linked. A few issues (mostly small):

Yes, I also use cmake for almost all my projects at work. I also think it should be recommended "default build way". Maintaining BUILDSCRIPT_MULTIPROC.bash44 could maybe be saved. However, not every system contains make I guess.

I could e.g. make a annotated version of the Makefile to explain what happens, if you like.

I really love the idea of make (the concept with looking to time stamps and having a rule database - as author of Makefile you don't need to know which steps to do in which order, make does it right and in parallel :)).

  • There's a trailing " in there, at the Windows CXX var.
  • A few typos (in the echo statements)

Ohh thanks, fixed. (I forgot vim spellchecking does not check echo messages, only comment messages)

  • The dependency updating at every invocation is unacceptable. It takes forever even if you just touch a single file (or even none).

Yes, the dependency tracking is very simplistic. On my machine it takes maybe 20 seconds, which still is much faster than the shell script, which takes two minutes or so.

It could be improved a lot by switching to be done while compiling, but users could break by setting CXXFLAGS without -M option, for the price of a slightly more complex Makefile (a detail is that when users would run it once without manual CXXFLAGS and then with CXXFLAGS without -M option, wrong depenencies would be used; detecting this in makefile would be clumpsy I think. We could avoid this by introducing some AM_CXXFLAGS that always has -M option, but then we cannot use make implicite rules and thus need to write them manually and instead of a "simple Makefile" approaching cmake more and more).

More features could be added to the Makefile, but for those needing more features, I think cmake of course still is much better anyway.

I think I could implement this, but I',m not sure if this is worth it. Real developers use cmake (or whatever else) anyway, people in a hurry will use g++ *.cc - who would use shell script or Makefile?

  • I always dislike the source tree being filled with object files. It makes opening files from the command line (with TAB-completion) a pain.

Yes, same for me. I thought the shell script does the same, but possibly I did't look well enough. Again this could be implemented into Makefile using so-called VPATH, possibly not a big deal, but question is for whom, as cmake already supports this (and handles a lot of nice cases, such as using a source file twice with different compiler flags and whatnot). Do you think it would be worth implementing it?

  • It doesn't work? If I make it, then do (for example) touch framewithattachment/framewithattachment.h, and make again, it will only recompile main.cc (but still take a long time because of the dep-update). That can't be right.

hah, OMG, you are right, it does not work, because in this simplistic usage gcc strips directory of source file for the name and the target name can be overwritten only completely (i.e. for a single file). Possibly g++ simply doen't support the use case "having multiple directories and using simplistic approach". So it would be needed to change have one depedency file per source file. According to my notes I unfortunately tested with main.h and it is the only single header file where the tracking accidentally "works" (it compiles main.cc which is correct here). Amazing. Sorry, I should have noticed this. How bad. Or remove the dependency tracking. (of course only if anyone is interested in the Makefile at all).

I will consider adding the Makefile to the project, since it does seem fairly low-maintenance, but with the cmake option in place, I do not see a great need for it. So definitely do not feel obligated to spend your time improving it.

Yes, I hink the same, I don't see much need. Maybe a simplified version just to replace the command line way (g++ */*.cc style) instead of the script (I removed dep track).

If you're worried about me not having incremental builds/dependency tracking/parallelism during development... Don't be, I use my own tooling with all of that. I do not use cmake or the provided scripts, and will not use the Makefile either.

Actually I was just looking into the shell script (after I couldn't use cmake, since mine was just 1.18) and noticed and the concurrency control (for loops, wait -n, need for specific bash versions (4.4?) and wrote a 15 lines makefile just for myself. Then I thought I'd share it, but found it too bad/too simple and so I added more and more...

Own build system? You mean another build system or really an own one?

If things to improve, I think I'd drop shell scripts (and Makefile) and improve cmake instead: adding install rules and possibility to generate a debian package (cpack), shouldn't be much work. Maybe adding a test.

sdettmer commented 7 months ago

the actual issue (-j unlimited) is fixed, shall we close this? The tool works fine and the HTML is lovely! Thanks so much for your efforts! Regards, Steffen

bepaald commented 7 months ago

Hi! Sorry for the late reply, been a bit busy and preoccupied with #179 in my spare time.

As far as I am concerned, this issue can indeed be closed. The Makefile was a good effort, but it still requires quite a bit of work. I understand why you went that route — because of the bug in the CMakeLists file — but since that has been fixed I don't see a great need for the Makefile.

The tool works fine and the HTML is lovely!

Thanks for your kind words, let's hope it stays working (back to #179...)

Cheers!