bitcoin-core / secp256k1

Optimized C library for EC operations on curve secp256k1
MIT License
2.08k stars 1.01k forks source link

Make all source and header files self-contained #1039

Open real-or-random opened 2 years ago

real-or-random commented 2 years ago

This week I noticed that the includes in this project are a mess. Actually, each file should be self-contained, i.e., include the stuff that it needs (and ideally only the stuff that it needs) but at the moment neither of these are true for a lot of files in the repo. Just try compiling a random .h or _impl.h file.

This is not per se a problem for our "single translation unit" compilation (I learned that the cool kids call this "unity builds") that we use in order to leverage the full potential of compiler optimizations without the need for LTO. The more important point (at least for me personally--though I may be the only regular contributor suffering here), it prevents the use of advanced tooling/IDE, e.g., my editor supports language servers such as ccls or clangd which "compile" the open file internally and provide a ton of features, e.g., completion, display of compile errors etc. But none of this works properly if you can't compile individual files. You can say that's the fault of these tools but self-contained files are also just good style and good engineering practice.

PR #924 was an attempt to clean up the includes but even the include-what-you-use tool there used there (or at least the way it has been used there) fixes includes only for the .c files.

Here are some things we'd need to do:

Again, I understand that I may the only one suffering, so fixing this may be on me but I still wanted to document my findings here. I don't except anyone disagreeing with the changes the mentioned here but please raise your hand if you do.

sipa commented 2 years ago

It's complicated.

The _impl.h files in modules/* aren't really standalone pieces of code, as they are included by secp256k1.c / tests.c / bench.c respectively. I tend to think of them not as separate things, but as "parts" of those respective .c files that have been moved elsewhere for organization purposes.

real-or-random commented 2 years ago

The _impl.h files in modules/* aren't really standalone pieces of code, as they are included by secp256k1.c / tests.c / bench.c respectively. I tend to think of them not as separate things, but as "parts" of those respective .c files that have been moved elsewhere for organization purposes.

Indeed but I don't see the point. What you say is true also for the _impl.h files not in src/modules/*, they're also included by the .c files.

So I think the rules should roughly be:

apoelstra commented 2 years ago

What if we renamed the _impl.h files to .c and compiled them as independent units in CI? So both forms of compilation would be supported.

real-or-random commented 2 years ago

What if we renamed the _impl.h files to .c and compiled them as independent units in CI? So both forms of compilation would be supported.

Yeah, I think that will be a very related goal. What I propose is to ensure that every file (incl. _impl.h) compiles on its own. Whether you call them .c or _impl.h is then more a matter of taste in the end and is then orthogonal.

However, I don't know if it's worth supporting that as an official compilation method. The reason why we have unity builds is the (former) lack of LTO. I think it was a good idea to separate the precomputed stuff (https://github.com/bitcoin-core/secp256k1/pull/1042) because it rarely changes and nothing is lost without LTO. But for the "normal" (not precomputed) files, I don't know: we wouldn't want to drop support for unity builds for people that still use compilers without LTO. And compilation times are still very manageable with the existing code base.

sipa commented 2 years ago

@real-or-random Hmm, perhaps my point is more philosophical than practical.

The way I like to think about source code organization is that the .h and .c file (or in our case, .h and _impl.h file) are one "unit". And to think about dependencies in the code, you think about "which unit cannot be used without which other unit". When you have units that depend in that way on each other, you have a "semantic" cyclic dependency between those units, which is a sign that those two units should really be one unit, or organized differently.

So specifically, if you have a situation where a.c includes b.h, and b.c includes a.h, while not being a cyclic dependency between the actual files, it is a cyclic dependency between the units. And in that sense, e.g. the modules//tests_impl.h files aren't proper separate units: modules//tests_impl.h depend on functions in tests.c, and simultaneously tests.c includes modules/*/tests_impl.h. In that sense, they're really better thought of as "part of tests.c, but moved elsewhere to make conditional compilation easy" than as separate units. Just adding non-impl .h files, without disentangling things, wouldn't change that.

But

Of course, there is no requirement that the situation stay that way. We could properly separate the modules//tests_impl.h files; which would probably involve moving some of the shared helper functions in tests.c to a separate file, which can then be included by both tests.c and modulus//tests_impl.h

I think that would be a good idea, actually. And if we do, @apoelstra's suggestion to try compiling all the impl files separately would be a nice way of testing that. Do you imagine that as actually renaming all the _impl.h files to .c in the repository, but just leaving the default compilation mode to still #include them into each other as now? Or do you mean e.g. having the CI script doing the renaming for this particular test? One concern I'd have with actually doing the renaming in-repo is that people building the project in an environment without our build system would actually build that way and get abysmal performance due to lack of inlining of field routines (actually, that's perhaps not uninteresting to test).

real-or-random commented 2 years ago

And if we do, @apoelstra's suggestion to try compiling all the impl files separately would be a nice way of testing that. Do you imagine that as actually renaming all the _impl.h files to .c in the repository, but just leaving the default compilation mode to still #include them into each other as now? Or do you mean e.g. having the CI script doing the renaming for this particular test?

If the goal is just to solve this issue and test this, then it would suffice to have CI compile the _impl.h files (and the other .h files). No need to rename them. edit: and just invoke the compiler on them manually without autotools. (That's what I meant when I said above that renaming is orthogonal -- I realize that we were writing our comments simultaneously.).

apoelstra commented 2 years ago

My intention was to do the bare minimum to test compiling the _impls in CI. Agreed that we should not make it easy or natural for library users to do this. I assumed, but did not check, that gcc would want a file to be called .c rather than .h for it to compile it as a unit, which is why I suggested the renaming.

I didn't consider just having the CI script do the renaming though. I think that might get us exactly what we want (a CI test of @sipa's semantic model of dependencies, as approximated by #include lines).