InterNetNews / inn

INN (InterNetNews) Usenet server
https://www.isc.org/othersoftware/#INN
Other
68 stars 13 forks source link

INN makefiles are not `make -j`-safe #206

Closed ewxrjk closed 1 year ago

ewxrjk commented 2 years ago
$ make -j$(nproc)
[...]
../libtool --mode=compile gcc -g -O2 -fPIE -fstack-protector-strong -I../include  -I.   -c -o ovsqlite/ovsqlite-server.lo ovsqlite/ovsqlite-server.c
libtool: compile:  gcc -g -O2 -fstack-protector-strong -I../include -I. -c ovsqlite/ovsqlite-server.c  -fPIC -DPIC -o ovsqlite/.libs/ovsqlite-server.o
ovsqlite/ovsqlite-server.c:41:14: fatal error: sql-init.h: No such file or directory
 #    include "sql-init.h"
              ^~~~~~~~~~~~
compilation terminated.
make[1]: *** [Makefile:82: ovsqlite/ovsqlite-server.lo] Error 1
make[1]: Leaving directory '/home/richard/src/inn/storage'
make: *** [Makefile:73: all-store-util] Error 2
make: *** Waiting for unfinished jobs....

The reason is that sql-init.h is generated at build time, so ovsqlite/ovsqlite-server.lo needs to explicitly depend upon ovsqlite/sql-init.h. This dependency is not explicitly stated in any makefile.

There is a dependency from ovsqlite/ovsqlite-server.o on sql-init.h in storage/Makefile. This is, also, generated code, but it's generated wrongly: it should be .lo rather than .o, at least for this directory (it's fine in directories where we don't build shared libraries).

So the underlying issue is that makedepend is broken. (Or possibly that shared libraries are more trouble than they are worth for INN.)

Julien-Elie commented 2 years ago

Isn't there a way to treat all .lo like .o in Makefiles? %.lo: %.o ?

Or maybe for the directories with shared libraries we should duplicate all .o entries with .lo entries... Or only add a special case for ovsqlite/sql-init.h to build it first? (using a recursive make call)

Or of course use Automake, as frequently discussed... that's more work though.

ewxrjk commented 2 years ago

A special case for sql-init.h would leave all the other dependencies unstated. Clean builds would probably be fixed but incrementals could miss rebuilds.

The .o entries need to become .lo. (Libtool has been mandatory since 89e304965e4d3771049bbfd6ed87a827f86ca71f). But AIUI they are generated by makedepend, so presumably that's where the fix has to be?

ewxrjk commented 2 years ago

I only mentioned Automake because I saw @rra mention it on inn-workers l-)

Julien-Elie commented 2 years ago

Yes, I had quickly a look at makedepend a few minutes ago and did not find in gcc documentation any special flag we could pass to "gcc -MM" to change the suffix... It could have been used. Maybe we then need to add a new argument to makedepend to tell it the suffix object, and if not "o" we add another "sed" to do the change?

Julien-Elie commented 2 years ago

Incidentally, there's a bug appearing in the parallel builds on Fedora, only on ppc64le (8 cores) and aarch64 (224 cores), but also for libstorage. For a reason I still do not know, the build system tries to rebuild a second time libstorage, and removes it just before its rebuild. During that lapse of time, another part of INN is built and fails because libstorage is not found.

Here is the whole build log: inn-2.6.4-rawhide-aarch64.log

ewxrjk commented 2 years ago

Oh, it's much more broken than I realized l-(

  1. top-level all depends on all-libraries and all-programs. There are no dependencies between these so it will evaluate them concurrently.
  2. all-libraries depends on all-storage which will cd storage && make library
  3. Concurrently, all-programs depends on all-innd which will cd innd && make all
  4. In the innd directory, innd depends on libstorage and because it hasn't been built yet, it will cd ../storage && make library, concurrently with #2
  5. The link of libstorage, which happens at least twice without any serialization due to the above structure, starts by removing links to the real build products in the .libs directory, meaning that any consumers of it that get linked at the wrong moment fail to link.

The pattern is repeated throughout the various directories.

rra commented 2 years ago

INN's build system is an example case for Recursive Make Considered Harmful. I've fixed a few problems over the years but mostly I just managed to move the problems around or make them less likely. It's probably theoretically possible to model the dependencies correctly with recursive make, but it's beyond me.

The solution is to use non-recursive make, the easiest way to do that is to use Automake, and I've been wanting to do that for years and having everything in Git makes it much easier. I may start writing a patch this weekend and see how far I can get. The tricky parts are integrating the plugable history and overview stuff and supporting the site directory, but I suspect I can hack together something workable.

ewxrjk commented 2 years ago

The top-level makefile running subdirectories concurrently is simply not compatible with the subdirectories running each other. Moreover if the cross-subdirectory dependencies were removed, then it would become necessary not only to serialize the top level but also to ensure it invoked subdirectories in a correct order. So the top-level concurrency has got to go, as long as a recursive structure is used.

I'll submit a PR for that sometime if Russ doesn't beat me to it with a more general solution (which I expect will fix the original issue too, since Automake does the fine-detail dependency generation for us).

Julien-Elie commented 2 years ago

As parallel build is broken, why not just disallow it? .NOTPARALLEL: should do the trick in Makefiles. Any solution other than the migration to Automake will otherwise still contain flaws... So better not spend time trying to fix things...

ewxrjk commented 2 years ago

why not just disallow it

Because I've got 6 cores and I don't want to wait for builds l-)

rra commented 2 years ago

I've got lib, most of storage, history, innd, and nnrpd converted to Automake on a branch, but there's a lot of fiddly bits so I probably won't have something ready for testing until this weekend. I think it's the right long-term solution, though. Among other things, I bet it will let us use conditional compilation for a bunch of stuff we're currently handling with whole-file preprocessor guards, which will make the code clearer and easier to read.

rathann commented 1 year ago

So, I've just got another parallel-make related failure with 2.7.0 on Fedora s390x builder: build.log . I can reproduce another failure locally if I enable SQLite support: bc.log

Julien-Elie commented 1 year ago

Does a parallel make run fine if you change in storage/ovsqlite/ovmethod.mk:

ovsqlite/ovsqlite-server: $(OVSQLITEOBJECTS) $(LIBSTORAGE)

to

ovsqlite/ovsqlite-server: $(OVSQLITEOBJECTS) libinnstorage.$(EXTLIB)

It is how other overview methods reference the library.

rathann commented 1 year ago

@Julien-Elie It looks like that last change fixed the build. I haven't seen any parallel make-related build failures for a few months now. Thank you!

Julien-Elie commented 1 year ago

Thanks for your feedback @rathann. Switching the build to a non-recursive one with Automake has its own ticket (#22) so we can close this one, specific to failures in parallel builds. In case they unfortunately occur again, feel free to open a new issue or re-open this one.