Make all source and header files self-contained

real-or-random commented 2 years ago

This week I noticed that the includes in this project are a mess. Actually, each file should be self-contained, i.e., include the stuff that it needs (and ideally only the stuff that it needs) but at the moment neither of these are true for a lot of files in the repo. Just try compiling a random .h or _impl.h file.

This is not per se a problem for our "single translation unit" compilation (I learned that the cool kids call this "unity builds") that we use in order to leverage the full potential of compiler optimizations without the need for LTO. The more important point (at least for me personally--though I may be the only regular contributor suffering here), it prevents the use of advanced tooling/IDE, e.g., my editor supports language servers such as ccls or clangd which "compile" the open file internally and provide a ton of features, e.g., completion, display of compile errors etc. But none of this works properly if you can't compile individual files. You can say that's the fault of these tools but self-contained files are also just good style and good engineering practice.

PR #924 was an attempt to clean up the includes but even the include-what-you-use tool there used there (or at least the way it has been used there) fixes includes only for the .c files.

Here are some things we'd need to do:

[ ] Introduce header files for modules: Currently each _impl.h file in src has a corresponding .h file for other files to include. But we don't do this properly in src/modules. For example, the schnorrsig module uses function from the extrakeys module but there are only _impl.h files in src/modules/extrakeys and no .h file for schnorrsig to include.
[ ] The same is true for some internal functions declared in secp256k1.c. There's no just header for other files to include.
[ ] Then make sure that each file includes at least all the stuff it uses, i.e., compiles on its own. Maybe with the help of a tool like clang-include-fixer.
[ ] ...and maybe enforce this on CI.
[ ] Maybe make sure that each file includes at most the stuff it uses, e.g., with the help of a tool like include-what-you-use.

Again, I understand that I may the only one suffering, so fixing this may be on me but I still wanted to document my findings here. I don't except anyone disagreeing with the changes the mentioned here but please raise your hand if you do.

sipa commented 2 years ago

It's complicated.

The _impl.h files in modules/* aren't really standalone pieces of code, as they are included by secp256k1.c / tests.c / bench.c respectively. I tend to think of them not as separate things, but as "parts" of those respective .c files that have been moved elsewhere for organization purposes.

real-or-random commented 2 years ago

The _impl.h files in modules/* aren't really standalone pieces of code, as they are included by secp256k1.c / tests.c / bench.c respectively. I tend to think of them not as separate things, but as "parts" of those respective .c files that have been moved elsewhere for organization purposes.

Indeed but I don't see the point. What you say is true also for the _impl.h files not in src/modules/*, they're also included by the .c files.

So I think the rules should roughly be:

The .c files should include all the _impl.h files or at least those it needs (maybe we can have a single "include-them-all" header to avoid that we need to add a new _impl.h to all .c files).
_impl.h and other .h files should just include the .h files that declare the stuff they need

apoelstra commented 2 years ago

What if we renamed the _impl.h files to .c and compiled them as independent units in CI? So both forms of compilation would be supported.

real-or-random commented 2 years ago

What if we renamed the _impl.h files to .c and compiled them as independent units in CI? So both forms of compilation would be supported.

Yeah, I think that will be a very related goal. What I propose is to ensure that every file (incl. _impl.h) compiles on its own. Whether you call them .c or _impl.h is then more a matter of taste in the end and is then orthogonal.

However, I don't know if it's worth supporting that as an official compilation method. The reason why we have unity builds is the (former) lack of LTO. I think it was a good idea to separate the precomputed stuff (https://github.com/bitcoin-core/secp256k1/pull/1042) because it rarely changes and nothing is lost without LTO. But for the "normal" (not precomputed) files, I don't know: we wouldn't want to drop support for unity builds for people that still use compilers without LTO. And compilation times are still very manageable with the existing code base.

sipa commented 2 years ago

@real-or-random Hmm, perhaps my point is more philosophical than practical.

The way I like to think about source code organization is that the .h and .c file (or in our case, .h and _impl.h file) are one "unit". And to think about dependencies in the code, you think about "which unit cannot be used without which other unit". When you have units that depend in that way on each other, you have a "semantic" cyclic dependency between those units, which is a sign that those two units should really be one unit, or organized differently.

So specifically, if you have a situation where a.c includes b.h, and b.c includes a.h, while not being a cyclic dependency between the actual files, it is a cyclic dependency between the units. And in that sense, e.g. the modules//tests_impl.h files aren't proper separate units: modules//tests_impl.h depend on functions in tests.c, and simultaneously tests.c includes modules/*/tests_impl.h. In that sense, they're really better thought of as "part of tests.c, but moved elsewhere to make conditional compilation easy" than as separate units. Just adding non-impl .h files, without disentangling things, wouldn't change that.

But

Of course, there is no requirement that the situation stay that way. We could properly separate the modules//tests_impl.h files; which would probably involve moving some of the shared helper functions in tests.c to a separate file, which can then be included by both tests.c and modulus//tests_impl.h

I think that would be a good idea, actually. And if we do, @apoelstra's suggestion to try compiling all the impl files separately would be a nice way of testing that. Do you imagine that as actually renaming all the _impl.h files to .c in the repository, but just leaving the default compilation mode to still #include them into each other as now? Or do you mean e.g. having the CI script doing the renaming for this particular test? One concern I'd have with actually doing the renaming in-repo is that people building the project in an environment without our build system would actually build that way and get abysmal performance due to lack of inlining of field routines (actually, that's perhaps not uninteresting to test).

real-or-random commented 2 years ago

And if we do, @apoelstra's suggestion to try compiling all the impl files separately would be a nice way of testing that. Do you imagine that as actually renaming all the _impl.h files to .c in the repository, but just leaving the default compilation mode to still #include them into each other as now? Or do you mean e.g. having the CI script doing the renaming for this particular test?

If the goal is just to solve this issue and test this, then it would suffice to have CI compile the _impl.h files (and the other .h files). No need to rename them. edit: and just invoke the compiler on them manually without autotools. (That's what I meant when I said above that renaming is orthogonal -- I realize that we were writing our comments simultaneously.).

apoelstra commented 2 years ago

My intention was to do the bare minimum to test compiling the _impls in CI. Agreed that we should not make it easy or natural for library users to do this. I assumed, but did not check, that gcc would want a file to be called .c rather than .h for it to compile it as a unit, which is why I suggested the renaming.

I didn't consider just having the CI script do the renaming though. I think that might get us exactly what we want (a CI test of @sipa's semantic model of dependencies, as approximated by #include lines).

bitcoin-core / secp256k1

Make all source and header files self-contained #1039