cheshirekow / cmake_format

Source code formatter for cmake listfiles.
GNU General Public License v3.0
954 stars 105 forks source link

CMake targets for cmake-format #201

Closed JohelEGP closed 4 years ago

JohelEGP commented 4 years ago

It'd be nice to have something like https://github.com/TheLartians/Format.cmake/ for cmake-format.

A particular feature I'm interested in is identifying CMake modules repositories to use their cmake-format configuration file (or using cmake-genparsers to generate it). These can be used by the targets, so that end users get semantic formatting of imported commands without having to add to their own configuration files.

I was thinking of using CMake cache variables and properties like in https://github.com/TheLartians/CPM.cmake/blob/master/cmake/CPM.cmake to implement a CMake module of this feature.

Alternatively, the command cmake-format could do it itself. Perhaps by having a per-user/system-wide configuration file where it would store paths to CMake modules repositories, along with the generated configuration files for repositories that don't include a cmake-format configuration file.

If this is out of scope, could you, @TheLartians, maybe be interested in adding it to https://github.com/TheLartians/Format.cmake/? I'm asking because the name doesn't suggest it's limited to clang or C++, even though I guess that's what's CMake is predominantly used for.

JohelEGP commented 4 years ago

The advantage of being a built-in feature is that the current integrations would continue to work, and do an even better job.

TheLartians commented 4 years ago

FWIW I think it's a great idea to incorporate cmake-format into Format.cmake. I personally haven't used _cmakeformat before, but it looks very useful. I think the best approach here is to either modify Format.cmake's version of git-clang-format.py or create something similar for cmake-format (if it doesn't already exist) to automatically detect the current project's CMake files in the git repo and create the diff report.

cheshirekow commented 4 years ago

Hm.., if I understand the README correctly for Format.cmake it sounds like the purpose is to add targets for formatting sources. The feature request here then is pretty similar to #178 so please have a look at the discussion there. In particular, I'm concerned about the practicality trying to devise a "fits-all" solution for the filelist in the presence of exclusions (e.g. third-party) , whether or not to include files which aren't used (e.g. never included() or add_subdirectory), and all the other complexities of a non-trivial build system.

I didn't dig into the implementation of Format.cmake but one of my concerns for such a feature is the poor performance of cmake for managing large "databases", such as "a mapping of all source files to static-analysis tools applied to them".

FWIW what this project (and several other projects which share this build system) do is:

  1. generate a list of files to be linted, formated, check-formatted
  2. import the generated manifest and create rules for each file

The mapping is generated in python instead of cmake, and it is checked on every build. In general, I think it is a Bad Idea to have more than one "every-build" rule/recipe, so I think that most build system designers would prefer to do an ad-hoc integration into their existing "every-build" recipe for performance reasons. For example, to ensure that the recipe for that rule only does one expensive scan over the filesystem.

A particular feature I'm interested in is identifying CMake modules repositories to use their cmake-format configuration file (or using cmake-genparsers to generate it).

Note that a cmake-format configuration file can include other configuration files. If you used a python configuration file, you could do the following, for example:

import glob
include = glob.glob(os.path.expand_user("~/.cmake-format/libs"))

which would allow you to roll your own solution.

One possibility for more integrated support would be to automatically include configuration files from a list of paths, such as /usr/lib/cmake/*, etc. But the problem here is matching the search path of cmake itself, and, futhermore, a project may override that search path or override the system cmake modules with newer (or older) versions.

I think a better long-term solution is to implement the ability to get command semantic information directly from the module source that defines that command. gen_parsers is a start, but I want the ability to annotate (perhaps through some structured syntax in a comment) anything that can't be inferred through the use of cmake_parse_arguments, as is done in gen_parsers. My plan is to have a cmake-index program that can index a cmake project and know what are all the modules that it includes, packages that it finds, and listfiles that it adds. That index then can include all the command format specifications from the whole project and cmake-format can utilize that index.

TheLartians commented 4 years ago

[...] I'm concerned about the practicality trying to devise a "fits-all" solution for the filelist in the presence of exclusions (e.g. third-party) , whether or not to include files which aren't used (e.g. never included() or add_subdirectory), and all the other complexities of a non-trivial build system.

I understand the concern, in the case of Format.camke, the issue of determining which files to include is resolved by only including files which are tracked by the git repository of the project. As this is the behaviour that users already expect for c/cpp files it would make sense to extend this logic for CMake as well (speaking only for the Format.cmake project).

The mapping is generated in python instead of cmake, and it is checked on every build. In general, I think it is a Bad Idea to have more than one "every-build" rule/recipe, so I think that most build system designers would prefer to do an ad-hoc integration into their existing "every-build" recipe for performance reasons. For example, to ensure that the recipe for that rule only does one expensive scan over the filesystem.

I'm not sure I understand correctly, but I also would not like the format target to be run on every build. Instead the targets are not added to the "all" target and are only invoked explicitly by the users cmake --build build --target format and enforced by a format check in the CI system.

cheshirekow commented 4 years ago

I understand the concern, in the case of Format.camke, the issue of determining which files to include is resolved by only including files which are tracked by the git repository of the project

This is a good example of one of the many potentially contentious design decisions that go into this type of feature. Personally, I am not a fan of this strategy. Having worked on projects that did this in the past, I found it incredibly frustrating to complete a feature cycle and have everything working including lint and format checks, but then I submit something to CI and it fails the lint and format checks because when I ran the checks I hadn't yet added those files to git.

but I also would not like the format target to be run on every build.

I was referring more to the detection that the list of files has changed. In my scheme, this means that cmake needs to re-run because each format, lint, or format-check get's its own rule and generates a witness file on a successful tool run so that the static analysis targets support incremental builds.

JohelEGP commented 4 years ago

Regarding the identification of CMake modules repositories, my intention was for them to register themselves into some global property that's a list of configuration files, which the target running cmake-format would make sure to include. For those repositories that don't provide a cmake-format configuration file, there should be a way for the user depending on the repository to mark it for running against cmake-genparsers.

Regarding the database, it could be bound to the CMake buildsystem, rather than something more encompassing.

Regarding the better long-term solution, does it clash with my suggestion? In that case, I'd try implementing the CMake targets. My immediate wish is to reduce the need to synchronize configuration files manually between repositories when a command is added or updated.

How does that sounds? Thanks for your replies. There's a lot to digest and I don't delve in python, so I might have missed something.

TheLartians commented 4 years ago

@johelegp I'm still not 100% sure I understand your intention, but if you are planning to write a script for cmake-format that works similarly to git-clang-format.py, I'd be happy to include it in the Format.cmake package.

@cheshirekow I can see how that can be frustrating, however at least for me 99% of commits modify already tracked files so the issue almost never arises. Do you prefer manually listing linted files or what would be a better solution in your opinion?

JohelEGP commented 4 years ago

I've opened https://github.com/TheLartians/Format.cmake/pull/11. Waiting for https://github.com/cheshirekow/cmake_format/pull/123 to continue.

cheshirekow commented 4 years ago

Regarding the identification of CMake modules repositories, my intention was for them to register themselves into some global property that's a list of configuration files, which the target running cmake-format would make sure to include.

Can you explain what you mean by a "Module Repository" and what it might look like for them to "self register"?

Regarding the database, it could be bound to the CMake buildsystem, rather than something more encompassing.

I don't quite understand this statement. Do you mean that it is stored in, e.g. a cmake variable?

Regarding the better long-term solution, does it clash with my suggestion?

I'm not sure. I don't yet fully understand your suggestion. I think anything that relies on distributing configuration "sidecars" would clash with the long-term plan.

My immediate wish is to reduce the need to synchronize configuration files manually between repositories when a command is added or updated.

Can you provide an example of what you mean? I would expect that you might keep a cmake-format configuration file along with the source code for whichever module it describes. Are you thinking about some kind of git subrepo or submodule situation? Or are you thinking about system-installed modules (e.g. in /usr/lib/cmake or /usr/lib/<arch>/cmake)?

cheshirekow commented 4 years ago

@TheLartians

@cheshirekow I can see how that can be frustrating, however at least for me 99% of commits modify already tracked files so the issue almost never arises.

That's good. It goes to illustrate how different projects might have different needs, and motivates my skepticism of a one-size-fits-all solution.

Do you prefer manually listing linted files or what would be a better solution in your opinion?

As mentioned in my first reply, what I do for some build systems I maintain is to keep a manifest of all the "lintable" source files (actually all kinds of static analysis: lint, format, tidy, etc) in the source tree. That manifest is checked for updates on every build.

JohelEGP commented 4 years ago

Regarding the identification of CMake modules repositories, my intention was for them to register themselves into some global property that's a list of configuration files, which the target running cmake-format would make sure to include.

Can you explain what you mean by a "Module Repository" and what it might look like for them to "self register"?

A git repository of .cmake files that are modules intended to be used with include() and CMAKE_MODULE_PATH, like https://github.com/onqtam/awesome-cmake#modules.

Regarding the database, it could be bound to the CMake buildsystem, rather than something more encompassing.

I don't quite understand this statement. Do you mean that it is stored in, e.g. a cmake variable?

Maybe. For now I'm just looking at CMAKE_MODULE_PATH to know which directories to search for cmake_format config files.

Regarding the better long-term solution, does it clash with my suggestion?

I'm not sure. I don't yet fully understand your suggestion. I think anything that relies on distributing configuration "sidecars" would clash with the long-term plan.

My immediate wish is to reduce the need to synchronize configuration files manually between repositories when a command is added or updated.

Can you provide an example of what you mean? I would expect that you might keep a cmake-format configuration file along with the source code for whichever module it describes. Are you thinking about some kind of git subrepo or submodule situation? Or are you thinking about system-installed modules (e.g. in /usr/lib/cmake or /usr/lib/<arch>/cmake)?

My CMake modules repository (https://github.com/johelegp/jegp.cmake_modules) has a .cmake_format.yaml with two-fold purpose. To format its own CMake files and to offer its additional_commands to downstream users. With just cmake_format, downstream users would have to copy such additional_commands into their cmake_format config files to have the commands provided by the modules nicely formatted, and the ensuing manual synchronization when commands are added, modified or deleted.

As an example, you'll see that I have just one additional_commands, jegp_add_test. I could easily add more flags or kwargs to support more complex cases and add them to my config file, but downstream users would have to manually update their copy of my additional_commands in their config files.

cheshirekow commented 4 years ago

Maybe. For now I'm just looking at CMAKE_MODULE_PATH to know which directories to search for cmake_format config files.

That might work in most cases, but I can see it becoming rather edge-case-ish:

  1. Whatever scanning is done will need to replicate the same logic that cmake-itself does when searching for modules. In particular, when it comes to precedence and overrides.
  2. Modules can include modules, and those nested includes might take advantage of additional knowledge. For instance a module may include(${CMAKE_CURRENT_LIST_DIR}/awesomelib/support/foo.cmake) so discovery may be incomplete

downstream users would have to copy such additional_commands into their cmake_format config files to have the commands provided by the modules nicely formatted, and the ensuing manual synchronization when commands are added, modified or deleted.

The configuration files support include=, so your downstream users could do the following:

user-config.py

include = ["cmake/modules/jegp/.cmake_format.yaml"]

Assuming that cmake/modules/jegp is where they've copied your project (or subrepo'd/submodule'd it)

I think you're looking for a way to eliminate even the need to include them, which would be really nice, but I think it's probably better to wait for improved tooling to accomplish that goal. For now, I think asking your users to add one line to their config is relatively low effort. Also, your users can always do something like this:

user-config.py

import glob, os
include = glob.glob(os.path.dirname(__file__) + "/cmake/**/.cmake_format.*")

Which effectively accomplishes that goal.

JohelEGP commented 4 years ago

I think you're looking for a way to eliminate even the need to include

Oh, not at all. I remember not seeing that in the documentation and reading an issue suggesting it but no linked commit implementing it.

cheshirekow commented 4 years ago

Oh, not at all. I remember not seeing that in the documentation

Ah yes. Documentation doesn't get generated for this one because it is parsed out in a separate phase before all the other options. I should really add documentation for it.

and reading an issue suggesting it but no linked commit implementing it.

Sometimes I forget to close issues in the commit message...

JohelEGP commented 4 years ago

So what I wanted is already implemented. I'll be implementing the CMake targets at https://github.com/TheLartians/Format.cmake/pull/11, so feel free to close this whenever. Thanks for everything.

JohelEGP commented 4 years ago

I think you're looking for a way to eliminate even the need to include them, which would be really nice, but I think it's probably better to wait for improved tooling to accomplish that goal.

Looks like I will have to implement that tooling support. I don't use git submodules, which hardcode where dependencies live, so I can't hardcode my includes. My dependencies could be anywhere. Using CPMFindPackage, they could be installed somewhere, or otherwise it'll download them in the build directory, which is an arbitrary directory, and the subdirectory they are at is also configurable through FetchContent.

The tooling solution couldn't be implemented solely by cmake-format without replicating CMake logic as you pointed out. By doing it though CMake, I hope to take advantage of what it's discovered.

I'm thinking of separating out the additional_commands into their own config file which is included from the cmake-format-recognized config file, and somehow make the former tooling-discoverable. Perhaps sticking to CMAKE_MODULE_PATH.

JohelEGP commented 4 years ago

I have decided to simply use include. I'll have to prepend build/_deps/<dependency name>-src/ to them with the defaults, but that should work quite well with CI.

JohelEGP commented 4 years ago

This has been implemented at https://github.com/TheLartians/Format.cmake/, thanks you.