Open peckto opened 2 years ago
Thanks for the write up. I would avoid the term annotation. Basically it seems that # 4 "main.c"
is a shorthand version of the #line preprocessor declaration https://docs.microsoft.com/en-us/cpp/preprocessor/hash-line-directive-c-cpp?view=msvc-170
Thanks for the write up. I would avoid the term annotation. Basically it seems that
# 4 "main.c"
is a shorthand version of the #line preprocessor declaration https://docs.microsoft.com/en-us/cpp/preprocessor/hash-line-directive-c-cpp?view=msvc-170
Thanks for the hint! That was not clear to me. If its an official preprocessor declaration, its even better. I updated the issue description accordingly.
I had some thoughts on tree-sitter and I think, I see more clear now.
If I see correctly, the tree-sitter parse-tree can only be updated by reloading the source file.
My misconception was, that I thought, we could update the parse-tree directly, which would allow us, to update the location, when resolving macros, the way we want.
But if we need to update the source file anyhow, where is then the difference in using an external preprocessor like gcc -E
?
If we need the step with the source file, I don't see, how our own preprocessor implementation could do any better than gcc -E
.
Or do I miss something?
Currently, the cpg handles macro resolution in the following way:
Code and file location are pointing to the un-resolved macro, but the node type and its specific properties (eg. name) are pointing to the resolved macro. Do we want to stay with this notion? Can we do it better? If I remember correctly, there is the special situation, where macros are used to expose the public API of a software (like with openssl). In this situation the user might expect to see the macro, instead of the resolved, internal function name. Is there a strong requirement in the direction of codyze?
Currently, the cpg handles macro resolution in the following way:
Code and file location are pointing to the un-resolved macro, but the node type and its specific properties (eg. name) are pointing to the resolved macro. Do we want to stay with this notion? Can we do it better? If I remember correctly, there is the special situation, where macros are used to expose the public API of a software (like with openssl). In this situation the user might expect to see the macro, instead of the resolved, internal function name. Is there a strong requirement in the direction of codyze?
cc @fwendland for Codyze
For the tree-sitter language frontend (#604 #608) we need to take care of the preprocessor our self.
To update the location properties accordingly, we want to do the preprocessing in the kotlintree. The preprocessor operates on the tree-sitter parse tree, resolves macros and updates the location property. The parse tee will then be handed over to the cpg. In the future we might also support loading of already preprocessed code as generated with
gcc -E
(see #719).The following example should outline the basic features and challenges for the preprocessor:
main.c
config.h
Run gcc preprocessor:
Create clang AST: