cheshirekow / cmake_format

Source code formatter for cmake listfiles.
GNU General Public License v3.0
949 stars 104 forks source link

Add semantic analysis support #249

Open sweetgiorni opened 3 years ago

sweetgiorni commented 3 years ago

Awesome project! I'm particularly impressed by the parser package; it's pretty much the only one I've found besides the actual yacc/lex definitions. Nice work.

I just finished up a smallish script that performs some automatic refactoring of CMakeLists.txt files. I made use of the cmake-file-api to get information about targets, dependencies, and source files. Then I used your parser to modify the CMakeLists.txt programmatically instead of screwing around with regex like I normally do. Then the modified parse tree is fed into the formatter module and out pops a new CMakeLists.txt 🤯

I ended up making a small wrapper class for the parser nodes to ease the process of traversing the tree. Even then, it's still a bit wordy. For example, adding a target dependency requires finding all the StatementNodes, checking the FunctionNameNode's token's content to see if it's a call to target_link_libraries, finding the first PositionalGroupNode to see if it's the target I want to modify, etc. It would be really cool to have helpers for that sort of thing, allowing for something like this:


target_name = 'FOO'
dependency = 'BAR'
scope = 'PRIVATE'

ptree = cmake_format.parse('/home/sweetgiorni/myproject/src/CMakeLists.txt')

target_obj = ptree.target(target_name)
if target_obj:
    target_obj.target_link_libraries(scope, dependency)
ptree.write()

In addition to expanding the capabilities of the linter and formatter, this would open up new possibilities for programmatic access to CMake files. As far as I know, the only comparable toolset would be the CMake source code itself.

What are your thoughts on this? I see that development has slowed a bit and you're still working towards a 1.0 release, so clearly this won't be a priority. Still, it's something interesting to keep in the back of your mind (if it isn't there already). I may fork this and play around with the idea as well.

cheshirekow commented 3 years ago

I think some of these helpers are in place. For example the StatementNode has a get_funname() (here), but I have not created any kind of "user-facing" API or documentation for refactoring. Refactoring is an intended usecase (which is why the parsed representation carries around it's source location). I agree it would be fantastic to include tooling to assist refactoring. I would say the first step is to come up with a list of usecases to support and then try to document what would be the most beneficial APIs.

One potential roadblock with any advanced usecase features, such as this one, is the fact that an arbitrary number of parameters may be hidden beneath variable references which the parser may not have available. I would like to support basic variable tracking and expansion (mostly for better linting) but knowing the exact variable expansion in every case would be (I think) re-implementing a significant amount of cmake itself. That doesn't preclude what you're asking for. One could always "refactor everything I can" and hopefully instances that are indeterminate due to variables are left over to some manual fixup.

Anyway, if you're interested in working through some of the details on the proposal I'd be happy to try to incorporate a "refactor API" into future updates.

sweetgiorni commented 3 years ago

Hmm, I see your point. Initially I was planning on extending the file api to get more dependency information, but clearly that isn't enough. Even if someone took the time to expose all the internals to the file api, I don't think the Kitware folks would be into it. The most effective thing I can think of would be writing C/C++ bindings for CMake. It would certainly help with all the heavy lifting, but I doubt it's feasible. I'll poke around and see how ridiculous it is. I'll also try to come up with some more detailed use cases for a potential parser api.

lanza commented 2 years ago

One potential roadblock with any advanced usecase features, such as this one, is the fact that an arbitrary number of parameters may be hidden beneath variable references which the parser may not have available. I would like to support basic variable tracking and expansion (mostly for better linting) but knowing the exact variable expansion in every case would be (I think) re-implementing a significant amount of cmake itself. That doesn't preclude what you're asking for. One could always "refactor everything I can" and hopefully instances that are indeterminate due to variables are left over to some manual fixup.

I was thinking of using a full --trace-expand and parsing that for this purpose. The values of all variables are scrapable via that output and you'd be able to hypothetically show that, for example, at file path/to/CMakeLists.txt line 45 CMAKE_CXX_COMPILER changed from unset to /usr/bin/clang++. You could also mock a debugging session with full knowledge of the backwards and forwards histories. You could also even make the states of the run queryable via some query language.

Here's the sample output if you're not familiar.

/tmp/asdf/CMakeFiles/3.21.3/CMakeCCompiler.cmake(46):  set(CMAKE_C_SOURCE_FILE_EXTENSIONS c;m )
/tmp/asdf/CMakeFiles/3.21.3/CMakeCCompiler.cmake(47):  set(CMAKE_C_IGNORE_EXTENSIONS h;H;o;O;obj;OBJ;def;DEF;rc;RC )
/tmp/asdf/CMakeFiles/3.21.3/CMakeCCompiler.cmake(48):  set(CMAKE_C_LINKER_PREFERENCE 10 )
/tmp/asdf/CMakeFiles/3.21.3/CMakeCCompiler.cmake(51):  set(CMAKE_C_SIZEOF_DATA_PTR 8 )
/tmp/asdf/CMakeFiles/3.21.3/CMakeCCompiler.cmake(52):  set(CMAKE_C_COMPILER_ABI  )
/tmp/asdf/CMakeFiles/3.21.3/CMakeCCompiler.cmake(53):  set(CMAKE_C_BYTE_ORDER LITTLE_ENDIAN )