eliben / pycparser

:snake: Complete C99 parser in pure Python
Other
3.24k stars 608 forks source link

preserve includes with CPP #455

Closed pierav closed 2 years ago

pierav commented 2 years ago

It is possible to preserve includes in the AST while using CPP. We stay in the spirit of the project: reduce the scope of pycparser on the parsing side.

Exemple (https://github.com/pierav/pycparser/commit/25b057f530a7baa70c81a38d43fc6c6bb08aadf8):

Here are the test files:

// test_include_define_and_stdio.h
#define HELLO "Hello"
#include <stdio.h>
// test_include.c
#include "test_include_define_and_stdio.h"

int main(void) {
  printf(HELLO);
  return 0;
}

Without cpp:

ast = parse_file('test_include.c')
# FileAST: 
#   Include: "test_include_define_and_stdio.h"
#   FuncDef: 
#     Decl: main, [], [], [], []
#       FuncDecl: 
#         ParamList: 
#           Typename: None, [], None
#             TypeDecl: None, [], None
#               IdentifierType: ['void']
#         TypeDecl: main, [], None
#           IdentifierType: ['int']
#     Compound: 
#       FuncCall: 
#         ID: printf
#         ExprList: 
#           ID: HELLO
#       Return: 
#         Constant: int, 0
c_generator.CGenerator().visit(ast)
# #include "test_include_define_and_stdio.h"
# int main(void)
# {
#      printf(HELLO);
#      return 0;
# }

With cpp:

ast = parse_file('test_include.c', use_cpp=True, wrap_include=True, cpp_args=args)
# FileAST: 
#   Include: "test_include_define_and_stdio.h"
#   FuncDef: 
#     Decl: main, [], [], [], []
#       FuncDecl: 
#         ParamList: 
#           Typename: None, [], None
#             TypeDecl: None, [], None
#               IdentifierType: ['void']
#         TypeDecl: main, [], None
#           IdentifierType: ['int']
#     Compound: 
#       FuncCall: 
#         ID: printf
#         ExprList: 
#           Constant: string, "Hello"
#       Return: 
#         Constant: int, 0
c_generator.CGenerator().visit(ast)
# #include "test_include_define_and_stdio.h"
# int main(void)
# {
#      printf("hello");
#      return 0;
# }

My approach is to surround the #includes of c files with tags. After CPP the tags still exist. We can then replace them with new includes. We have then preprocessed the file, but the includes are persistent.

What do you think about this ?

eliben commented 2 years ago

It's not clear to me what the point of that would be. The includes "stay alive" in the #line directives anyway, and you cannot typically compile a C file without actually including the code in the #includes.

Whatever your use case, it seems very special. I expect you could achieve it by some preprocessing of the original code, adding pragmas along with each include - the pragmas are preserved in the AST.

pierav commented 2 years ago

Thank you for your response and your time!

The idea is to parse only the target file and not the included files. This avoids the use of fake_libc_include. Preserving the inclusions also makes the code generated with c_generator more readable (useful in my case where I do a source to source conversion).

Although there are many possibilities to hide the #includes, these involve a tampered ast. I find it good to have a "clean" ast (i.e. without pragma to hide the includes).

eliben commented 2 years ago

Please read https://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers/ about why it's impossible to really parse C without first parsing its headers.