[WIP] Move OpenQL to modern CMake practices

pablolh commented 1 year ago

Increase drastically the granularity of CMake targets

this allows better code organization and well-defined dependencies within modules (e.g. passes should not depend on each other)
the cmake files are cleaner and more explicit
this speeds up compilation

wvlothuizen commented 1 year ago

Regarding the move of the include directory, please bear in mind that there is a philosophy behind the current structure: header files that may be of use to code external to OpenQL go in include (such code would use nothing under src, only a library compiled from it), other header files (e.g. most .h files used by the CC backend) go with the source. Also see CONTRIBUTING.md

pablolh commented 1 year ago

Good morning,

Thanks for your messages. I have started this draft PR in a rush yesterday evening before taking the train to check the CI :D and certainly didn't provide enough context, so here is more info. Keep in mind that the PR is not finished yet, but let's discuss already.

I think OpenQL has grown large enough that having a single CMake library target for the whole lib is no longer viable in my opinion. Here are a few benefits of splitting the main library into sub-libraries:

well-defined dependencies between modules: with that organization, CMake enforces good code organization, for example, you won't be able to include a header from a pass folder into a utils or ir source file. Indeed, each sub-library exposes a restricted set of headers to its consumers. Those dependencies between CMake modules are defined with target_link_libraries.
locality: for instance, the resource file for architecture cc_light is generated in the ql/arch/cc_light folder, which make a lot more sense and unclogs the main CMakeLists.txt.
exposing only what needs to be exposed: currently, linking against the OpenQL library allows including every single header in include/, which is too much and unnecessary. Users of the C++ API should not have access to pass manager internals or ir stuff for instance.
minimal linkage and incremental build times: suppose in the future I add a CliffordTest unit test class that tests some properties of the Clifford optimizer pass. This internal unit test would not need the OpenQL API, but just include the CliffordPass header and link against the clifford pass library. I don't want this unit test executable to link against any of the other passes, and it should also be independent from all the architectures, etc. I don't want to have this test re-run whenever I modify the diamond architecture, which would happen with the current structure. When this change is ready I can guarantee you that developer productivity will be better because the build is more incremental and build iterations that you do during development will be way faster.

I was using this kind of modern CMake files organization with my previous employer, a large codebase with around 20 developers. Later on we moved to use Bazel, another build system that goes even further in modularization and this kind of build system philosophy for large projects and helped with build times and organization and productivity immensely. One step towards this was modularization of the CMake build like in this PR.

Some reference: https://stackoverflow.com/questions/47162766/how-to-write-cmakelists-txt-for-a-big-project-with-multiple-subdirectories https://www.siliceum.com/en/blog/post/cmake_01_cmake-basics/

To respond to some of your concerns:

nested include folders

Indeed that is a bit surprising at first, luckily IDEs fold the nested folders so navigation is not hindered. That's something CMake doesn't do too well (compared to e.g. Bazel which does symlinks at build time), but is the price to pay to have qualified header include paths (e.g. "ql/utils/map.h" instead of "map.h").

details headers should stay in details

I intentionally moved them to the normal include folder because I didn't want those libraries to include their own folder (since some submodules are in subfolders of some other submodules). Personally I prefer qualified include paths also for private headers (e.g. #include "ql/pass/ana/visualize/detail/circuit.h"), which would be possible without exposing those private headers to consumers by creating a private_include nested folder on top of the existing include folder for each such submodule, but it's a bit overkill I think. Again, currently they are in a details folder but still they can be included anywhere in the project, so this is still better. See this in the current OpenQL/CMakeLists.txt:

target_include_directories(ql
    PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/src/"
    PRIVATE "${CMAKE_CURRENT_BINARY_DIR}/src/"
    PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/include/"
    PUBLIC "${CMAKE_CURRENT_BINARY_DIR}/include/"
)

architecture+pass registration done statically

I certainly agree with splitting this in a separate PR. In fact, all the changes to C++ source files and headers can be reviewed separately and that will work with the previous CMakeLists.txt. I will do that.

namespace updates for some reason

That is also independent. I did that to avoid some unnecessary and unclear dependencies between CMake modules.

pablolh commented 1 year ago

The separate PR to move to self-registering passes and architectures: https://github.com/QuTech-Delft/OpenQL/pull/461

jvanstraten commented 1 year ago

I don't think I should still have much of a say in OpenQL stuff (if any); the way I see it, you're the maintainer now and therefore you should get to choose direction for maintenance :) But since I was asked;

Users of the C++ API should not have access to pass manager internals or ir stuff for instance.

I'm inclined to disagree with you on that. Since backends can be so different and OpenQL users are likely to be experimenting with compiler stuff, the idea was always that users should be able to register their own passes dynamically without having to fork and rebuild OpenQL. I haven't looked in detail at how you're suggesting to change pass registration, but I would hope that dynamic registration is still possible in addition. That being said; again, I don't think my opinion should matter much. Just be aware that it was a feature rather than an oversight, and thus that you might have to do some documentation updates to keep everything synchronized if you guys decide that maintaining that is not worth the effort.

nested include folders

I can confirm that this is just life with CMake, at least if you want to keep the CMake files even remotely comprehensible for mere mortals. For example, Apache Arrow manages to avoid it, but then there's these eleven thousand lines of code just for custom CMake support functions, not to mention that the compilation modules themselves are generally nontrivial.

I don't really have much to add beyond that. I'd generally be in favor of more strict separation, as long as you actually manage to get CMake to comply. Just make sure that it actually works right on all supported platforms, and that CMake doesn't casually try to install things as generically named as /usr/lib/libutils.so. Ideally, the modules are linked statically into one dynamic library and only that dynamic library is installed, but "one does not simply." In my experience, getting advanced linking stuff to work right across platforms and getting CMake transitive dependencies to work right across versions or at all is HARD, but I'm hardly a CMake wizard; I merely tolerate it, and barely.

pablolh commented 1 year ago

Hi Jeroen and thanks a lot for your input!

Since backends can be so different and OpenQL users are likely to be experimenting with compiler stuff, the idea was always that users should be able to register their own passes dynamically without having to fork and rebuild OpenQL.

(Maybe this has changed since you last worked on OpenQL, I'll see what Hans thinks) I have the impression that everyone is currently using the Python API (therefore you cannot define your own pass), and if someone actually does with the C++ API I'd be happier if they contribute to OpenQL directly? What do you think @jvansomeren ? Also I remember we've mentioned moving to a more "normal" compiler that doesn't even have an API, but simply an executable that takes input files and a handful of options.

Otherwise I think we're on the same page concerning "tolerating" CMake haha In fact this PR failed on CI because I forgot to test locally with X11 libraries (the visualizer parts would not build). I can give it another shot, I'm also curious to see if this works on Windows.

QuTech-Delft / OpenQL

[WIP] Move OpenQL to modern CMake practices #460