Open boris-kolpackov opened 1 month ago
I think we need to step back a bit and consider what kind of variability in this area is practical/sensible and what we want to discourage. And then based on that understanding try to define build2
mechanisms to support this variability. In particular, I really don't want to add any mechanisms that will give C++ users even more rope to hang themselves (meaning worsen the "variability mess" which is modern C++ builds), especially if this also saddles build2
with extra complexity and maintenance burden.
What are the plausible approaches when switching a project from headers to modules? I think it makes sense to enumerate all the likely choices since these approaches will have to co-exists (i.e., different projects will make different choices but may end up in the same build). I can think of the following options:
Replace headers with modules (for example, in the next major version of the project).
With this approach there is no attempt to make headers and modules versions to co-exist in the same build with everyone either using the modules version or the headers.
Create a new project (for example, libhello2
) which uses modules while maintaining (or even actively developing) the original header-based version for some time.
If the new project uses a new namespace (for example, hello2
), then the two versions may even coexist in the same build. Though allowing the two interfaces to inter-operate will most likely require extra effort (think vocabulary types).
Provide the dual headers/modules interface by providing independent headers and modules wrappers over the shared implementation (which is will necessarily stay headers-based). Think of a pimpl idiom but applied to modules rather than classes.
It feels like there should be no difficulty supporting the dual interface simultaneously from the same build. Though whether the two interfaces can inter-operate is questionable (essentially the same problem as in option (2) above).
Provide the dual headers/modules interface by somehow sharing most of the interface source code between headers and modules.
Whether this approach can supporting the dual interface simultaneously from the same build depends on how exactly things are arranged (see below).
I think the first three options are pretty clear. So let's see what are the practice/sensible ways to achieve (4).
In the early modules days we've tried to support both headers and modules from a shared set of source files in a relatively small library (libbutl
). It didn't go well, to put it mildly. The resulting headers/module interfaces got really hairy due to all the macros and ifdef
's.
One thing I found particularly dizzying (literally) is keeping straight all the imports/includes in the module interface and implementation units. Remember that when you do, for example,import std;
in the module interface in the module's purview, all the imported names are automatically made visible in the module implementation units without
an explicit import std;
. But that's not the case with headers and you will need to pause and think where you need to include each header. If you are interested to see what it used to look like, here is the commit that ripped all this dual support out: https://github.com/build2/libbutl/commit/df1ef68cd8e85
Now, I am sure people will keep trying this approach (here is Boost exploring this idea) and it may even work for small projects. However, I think it's a dead end, generally, both technically but also conceptually: modules were meant to make source code organization cleaner, not to turn in into an incomprehensible macro mess. So I don't think we need to go out of our way supporting this approach in build2
. If someone wants to go down this rabbit hole, they should be able to cobble something together (as we did for our experiment in libbutl
).
The only practical/sensible approach that I am aware of for implementing option (4) seems to be exporting names as attached to the global module fragment, which is how the standard library modules are done in both Clang/libc++ and MSVC/STL (GCC/libstdc++ is considering re-exporting standard library headers compiled as header units, though I doubt it will be the final choice). For details and additional nuances see this post on the Boost mailing list (the whole thread is a recommended reading).
Specifically, there appears to be two variants of this approach:
Include the header into the module interface and then export the interface explicitly (this is how the standard libraries are done):
module;
#include <libhello/hello.hxx>
export module hello;
export namespace hello
{
using hello::say_hello;
}
With this approach supporting the dual interface simultaneously from the same build comes pretty much automatically (there is no module interface without first having a header).
The alternative is to include the header in the module purview and wrap the header into extern "C++"
:
export module hello;
extern "C++"
{
#include <libhello/hello.hxx>
}
And inside hello.hxx
we will need to do something like this:
#ifdef __cpp_modules
export
#endif
namespace hello
{
...
}
I am not aware of any substantial codebases that use this approach in practice. While it definitely feels less tedious compared to explicit export, I am not sure whether there are any gotchas (there most likely are). In particular, it seems one will have to export all the inter-included headers at once and from the same module. Also, it's not clear whether an interface compiled like this is compatible with the implementation unit compiled with a header (or vice versa).
To sum up, the first approach for option (4) is tedious but is proven to work well and we can simultaneously support both headers and modules from the same build. The second approach looks less tedious (at the expense of some macro hackery) but is likely to have gotchas and it's unclear whether it can support both headers and modules simultaneously. Note also that with both approaches, at its core, the project stays headers-based. You will not be using any advanced modules features like partitions to organize your code.
Regarding using standard library as modules vs headers, this feels largely orthogonal to the modules enablement issue discussed above. However, a couple of notes:
It's possible that a project may wish to import std
but itself continue to use/provide headers.
This desire to be able to choose either modules or headers may extend to libraries other than the standard library, if such a library also provides the dual interface. At the extreme, one may wish to decided this on a library by library basis.
One immediate difficulty that I see with supporting both standard library modules and headers from the same codebase is keeping the correct set of #include
directives. Though it's probably just an inconvenience (one can either resolve to use headers during develop or to rely on CI to catch any missing directives).
Thoughts?
However, I think it's a dead end, generally, both technically but also conceptually: modules were meant to make source code organization cleaner, not to turn in into an incomprehensible macro mess. So I don't think we need to go out of our way supporting this approach in build2.
I agree.
The only practical/sensible approach that I am aware of for implementing option (4) seems to be exporting names as attached to the global module fragment, [...] I am not aware of any substantial codebases that use this approach in practice. [...]
fmt
uses that approach in production and is widely used (not as module though). It also provides an option for fmt
module to use import std;
since v11.0.0
. @kamrann's reflections, if I'm not mistaken, arise among other things from the packaging effort for that library in addition to experimentations relative to modules that we exchanged about in private.
As a data point, if you go there https://arewemodulesyet.org/ and check the first ✅ you will see a top list of modularized libraries. I did a cursory check of the module source file of each of the libraries in that short list and the only ones that uses the global-module fragment injection approach, specifically alternative 2, are fmt
, argparse
and async-simple
.
Though I might have missed a few others using alternative 1 if it was not immediately obvious to me, but at least these ones are clear. Note that tgui
is the one with the weirdest modules setup I've seen so far, one of it's modules use alternative 1 but not the others - or I'm confused by the juggling.
Regarding using standard library as modules vs headers, this feels largely orthogonal to the modules enablement issue discussed above. However, a couple of notes: [...]
Indeed. Looks like the more libraries providing the choice the bigger the explosion of options for the end-user with a deep dependency graph.
If each library had a general way to determine by themselves if they can or not use import std;
("use import std;
if you can" enabled by default), that would simplify the default situation where the end-user dont need to specify any option per library, but because of the differences in implementations stability when using modules, at the moment at least, the end-user projects might end up having to chose to use only-includes-std on some configurations or only for specific library+configuration combinations. Hence question 1.
A couple of additional sources of information:
There is a Clang document that goes into more detail on various headers-to-modules transition approaches. In particular, it lists "shared set of source files without global module fragment" approach (i.e., what we tried in libbutl
) as a viable option (it is referred to as "ABI breaking style"). I still think it's not going to be tractable except maybe in a few special cases (just look at all the macro soup in the examples).
There is a new post in that Boost thread with some additional insight.
Sorry for the delayed input, travelling and generally struggling to stay on top of things lately. Some basic thoughts before I put off replying again:
The way I see things right now, modules are just problematic during development if there is still some need for header support (in truth my experiments so far have left me somewhat downbeat on the prospects of modules generally). Given that, I think 4 suits (when 1 isn't viable) as far as modules wrt packaging libraries goes, with the assumption that library development is probably done with modules disabled. It's unfortunate since as noted, this means the code is in no way properly modularized; but it is at least convenient for the downstream consumer. It also fits well for making build2
packages of existing libraries, which is of course the common case.
A couple of other tangential points relating to modules with build2
, while I think of it.
build2
's approach to module resolution not being transitive in the same way include paths are - immediate lib prerequisites only - this leads to needing to add prerequisites on libraries that the code of the target in question doesn't directly reference. For example, C includes header from B which imports module from A. C will need a lib prerequisite on A as well as on B. I think build2
's approach here is no doubt the right one in a properly modular world and I'm not suggesting it should be changed, but just wanted to point this out in case it hadn't been encountered.A further question after hitting some issues with the fmt
modularization.
Edit. After writing the below it occurs to me that the problem is perhaps wider than just the symbol export macro. fmt
uses also a FMT_MODULE
macro to control various module/non-module conditional compilation, and this too would need to be defined when building the BMI for the consumer. I've left the below as is though as I think it sums things up well enough, and also I need lunch!
From what I've read, there are some fairly strict (though varying by implementation) requirements regarding matching compiler options between module and consumer in regards to building the BMI. I believe I'm correct in saying that CMake propagates such options some way or other so as to be able to build a BMI in an imported library with the same compiler options that were used when the module was built as part of the library. I'm wondering if build2
is doing something similar here? From looking at the .pc
files from an installation of fmt
there doesn't look to be anything special in there, beyond the module mapping.
To give the specific example that's caused me to wonder about this. fmt
upstream currently contains the following:
#if !defined(FMT_HEADER_ONLY) && defined(_WIN32)
# if defined(FMT_LIB_EXPORT)
# define FMT_API __declspec(dllexport)
# elif defined(FMT_SHARED)
# define FMT_API __declspec(dllimport)
# endif
#elif [...]
Now I guess something will need to be changed here - I hit linker errors when attempting to build the tests against installed fmt
, and was reminded of what I read a while back in the build system manual regarding modules and symbol exports. It's not clear to me though how to deal with this, in particular when trying to support dual mode.
FMT_LIB_EXPORT
is defined, giving dllexport
, which is correct.dllexport
and the compiler will deal with it (though maybe expanding to nothing would also work).dllimport
(actually only necessary for data I think, but anyway we definitely don't want dllexport
).From what I can see, when build2
builds the BMI on the consumer side, it is passing the same compiler options as it would use for consumer code that depended on the library providing the module (i.e. in this case I'm seeing -DFMT_SHARED
, which is exported as poptions
by the fmt
library target). Unless I'm missing something though (apologies if I have, my head is a bit fried right now from juggling all the different combinations), this isn't quite enough information. We would need to either:
buildfile
of the library) that FMT_LIB_EXPORT
should be defined when building the BMI for the consumer.Thoughts?
Moving a slack discussion started by @kamrann: