evaleev / libint

Libint: high-performance library for computing Gaussian integrals in quantum mechanics
Other
218 stars 96 forks source link

cmake harness, 2023 edition #259

Open loriab opened 1 year ago

loriab commented 1 year ago

This is a squash of #233 (2022) based on #205 (2021) that includes #148 (2019) followed by rebase to master (2023).

~Ignore for a bit. It's passed some local sanity checks, and this'll start the GHA check, but I haven't checked moved files very carefully or even looked into the new python build or arbitrary order derivative changes that were in master.~

loriab commented 1 year ago

I'm still working on this, but I think it's now up-to-date with master.

loriab commented 1 year ago

@evaleev, I think I've got the solid harmonics ordering configurable at runtime though the route you suggested in https://github.com/evaleev/libint/issues/190#issuecomment-691485225 . Before rolling it out further (and adjusting all the cmake), could you check that I've got it straight below that option LIBINT_SHGSHELL_ORDERING shows up in two ways?

loriab commented 1 year ago

My action plan regarding solid harmonics is to

Any thoughts or alarms?

loriab commented 1 year ago

I think I'm done again with this for the year. Below are the more important changes (that I remember) beyond #233. There's simply so many ways of building this project that I surely haven't tried them all, but tests and psi4 usage are in good shape.

timostrunk commented 11 months ago

@evaleev Are there blockers for a review here? I think this PR is quite important for the future of libint. It would be a shame if this work goes to waste.

loriab commented 10 months ago

fwiw, @timostrunk and @hadim, my current hope is to get the #269, #270, #271 PRs (cummulative) that address the run-time parts of this one merged and released. Doing that can release the psi4 and downstream blocks. If that goes through, I can rebase/refactor this PR to deal with the remaining build-/detection-time changes.

evaleev commented 7 months ago

@loriab let me know what the status of this is, would be great to get this saga finished before 2024

loriab commented 7 months ago

@loriab let me know what the status of this is, would be great to get this saga finished before 2024

Ah, from what I recall, the basic status is:

I'd dearly like to get this finished before 2024, too. But I think it'll take enough revising that I can't work on it until mid-Dec.

This PR isn't a big change for L2 users, but I suspect it'd be convenient for the devs in the new flurry of work. I'd propose a release w/o this PR soon for user continuity, then I'll try to patch this up this year, and perhaps your group can field test it a bit before a 2024 release.

evaleev commented 7 months ago

I'd dearly like to get this finished before 2024, too. But I think it'll take enough revising that I can't work on it until mid-Dec.

No problem at all.

This PR isn't a big change for L2 users, but I suspect it'd be convenient for the devs in the new flurry of work. I'd propose a release w/o this PR soon for user continuity, then I'll try to patch this up this year, and perhaps your group can field test it a bit before a 2024 release.

Yes, @JonathonMisiewicz could use this now. Not a show-stopper, but would be good to finally get this checked off.

LecrisUT commented 6 months ago

Hi @loriab, I have some comments about the design, but I think they will be better addressed after this one is merged. Basically some of the ideas are in https://github.com/LecrisUT/CMake-Template/pull/1, but we should discuss them there since it is a smaller more flexible template project.

For this PR, I only have one non-blocking comment, can you explain a bit about LIBINT_BUILD_LIBRARY_AS_SUBPROJECT and ExternalProject_Add design? I think it would be better to have a fallback of find_package(CONFIG) -> FetchContent with the interface from cmake 3.24 (with some backports to earlier versions for now)

loriab commented 6 months ago

FYI, @LecrisUT, this won't be merged as-is. The user-facing code parts have already been merged as PRs 269, 270, 271. What's left is the builder-facing CMake parts that I'm reexamining and hopefully PRing in pieces rather than en masse, starting with 299.

Yeah, LIBINT_BUILD_LIBRARY_AS_SUBPROJECT vs. ExternalProject_Add :-). Basically, I'm all for the latter for this role, especially as it keeps the dev library build separate and configured most like the more common independent library build. If a variable doesn't pass through, I know to fix it in the cmake and add it to the INSTALL.md. I think the main developer prefers FetchContent for not having to define every variable passed through, so he added the former option. It sometimes takes two steps, and iirc I couldn't get the GHA to always work for it.

LecrisUT commented 6 months ago

I think the decision would depend on how much the subproject is designed to be a subproject vs a standalone project. If it's the latter, I would agree that the latter option is a good intermediate. But whenever possible I would suggest going for FetchContent compatible design since that is a major focus of CMake upstream, and it is more compatible when chaining projects.

It sometimes takes two steps, and iirc I couldn't get the GHA to always work for it.

That is odd. The only thing that comes to mind is if the sub-project is not designed to be a subproject. At the same time, this is quite an odd design with the packing and unpacking and would need to look more deeply. But if you need another pair of eyes on such issues, let me know.

evaleev commented 6 months ago

Yeah, LIBINT_BUILD_LIBRARY_AS_SUBPROJECT vs. ExternalProject_Add :-). Basically, I'm all for the latter for this role, especially as it keeps the dev library build separate and configured most like the more common independent library build. If a variable doesn't pass through, I know to fix it in the cmake and add it to the INSTALL.md.

LIBINT_BUILD_LIBRARY_AS_SUBPROJECT is merely a fragile convenience for experts. I would not advertise it because it is extra fragile. When tinkering with the generated library it is convenient though, e.g. for IDE to comprehend the generated library (targets, indexing) from within the compiler project.

I think the main developer prefers FetchContent for not having to define every variable passed through, so he added the former option. It sometimes takes two steps, and iirc I couldn't get the GHA to always work for it.

I can't imagine how to make Libint properly subprojectable without a lot of logic duplication. The key benefit of FetchContent, making subproject's targets available within the main project at configure time, is not possible here since the user code wants to subproject the generated library, not the compiler. One can create targets in the compiler project that will forward to the library targets, but it would require duplicating most of the library harness logic in the compiler harness.

We could give user the option of building/running the compiler at configure time, but that seems fragile as well.

LecrisUT commented 6 months ago

We could give user the option of building/running the compiler at configure time, but that seems fragile as well.

That seems similar to things that I've done on fypp and something that I want to do for swig as well. In those cases I make it explicitly compatible with FetchContent and find_package. I am not familiar with the project to know what it does or if it is an appropriate solution, but if you have the time to explain the overall structure, I can offer some additional feedback of potential design approaches. Basically if it needs a precompiler or to inject libraries like in Python_add_library I think there is a clean design there.

evaleev commented 6 months ago

We could give user the option of building/running the compiler at configure time, but that seems fragile as well.

That seems similar to things that I've done on fypp and something that I want to do for swig as well. In those cases I make it explicitly compatible with FetchContent and find_package. I am not familiar with the project to know what it does or if it is an appropriate solution, but if you have the time to explain the overall structure, I can offer some additional feedback of potential design approaches. Basically if it needs a precompiler or to inject libraries like in Python_add_library I think there is a clean design there.

The libint project contains a compiler, which generates a libint library. The library code is packaged into a cmake project, and consumed by others as a tarball (I use FetchContent for that). There is an infinite number of configurations, so we can only ship a few we (I) care about.

The main complexity stems from a large number of use cases. Libint devs want to be able to build the whole thing starting from the compiler and don't care about subprojectability. Most libint users want to consume the library, not the compiler. Since library generation is expensive, we don't generate at configure time. Thus the need for a separate library generation step. In superbuild scenario this is fine, but in the common (FetchContent) scenario we must have the library already generated. This is why I never cared about making the compiler build harness to be cmake ... but still there are good reasons to (thanks, @loriab!)

LecrisUT commented 6 months ago

The library code is packaged into a cmake project, and consumed by others as a tarball (I use FetchContent for that). There is an infinite number of configurations, so we can only ship a few we (I) care about.

Ok, then this sounds like find_package(COMPONENTS). Using FetchContent, the consumer knows which libint library they want to generate, while in the packaged version, you want to pick up specific COMPONENTS that have been packaged. My suggestion is not even to use FetchContent, but plain old add_subdirectory where the inner project is templated with variables defined in the main project that calls the add_subdirectory. Then you can simply have a for loop to include of the requested templetized projects. Maybe llvm would also be a good reference for the design

The libint project contains a compiler, which generates a libint library.

Then this is the same design as for swig. Here we could have two different approaches. One is the add_subdirectory approach mentioned previously. The other one is to have a libint_add_library function that will generate the appropriate library target on-the-fly, where inside the function you mimic the templetized project. Which approach to choose depends on if you want the libint library to be pre-built and packaged (add_subdirectory and the current approach I believe), or a minimal compilation and the consuming project would generate and manage the corresponding library (`libint_add_library).

Since library generation is expensive, we don't generate at configure time.

Note that with FetchContent, only the configure stage is done, so if the tarball generation can be avoided, this step would be rather quick.

but in the common (FetchContent) scenario we must have the library already generated

That should not be the case. As long as the appropriate CMake targets are made available, CMake would be able to compile the generated libint library before it is consumed by a linker.

loriab commented 6 months ago

EFV can correct me if I'm wrong, but probably we're not looking to restructure the build from this PR. The number of people running the actual compiler/generator step of libint are about 2 * number_of_qc_pkgs_using_libint + number_of_misc_packagers + everyone_in_the_Valeev_group = ~few_dozen . So as long as a scheme improves the lives of developers (with end-to-end cmake generate/build/test) and improves the lives of consumers (ease of embedding configuration info into library with components, as you say), I think it's doing its duty.

The PR scheme has been steady in Psi4 for the past 2y, so apart from exploring configuration space I haven't touched but mpqc/nwchemex might, I'm confident in its utility.

evaleev commented 6 months ago

That should not be the case. As long as the appropriate CMake targets are made available, CMake would be able to compile the generated libint library before it is consumed by a linker.

I agree, but the impact of doing this on maintainability (due to the need to also support building the targets in standalone library harness ... so much logic is duplicated) coupled with roughly 0.01 developer/year allocated to libint makes this a nonstarter. We confine all the library target logic to the library harness which cannot be configured until the compiler had a chance to run.

EFV can correct me if I'm wrong, but probably we're not looking to restructure the build from this PR. The number of people running the actual compiler/generator step of libint are about 2 * number_of_qc_pkgs_using_libint + number_of_misc_packagers + everyone_in_the_Valeev_group = ~few_dozen . So as long as a scheme improves the lives of developers (with end-to-end cmake generate/build/test) and improves the lives of consumers (ease of embedding configuration info into library with components, as you say), I think it's doing its duty.

Agreed.