eliben / pycparser

:snake: Complete C99 parser in pure Python
Other
3.24k stars 610 forks source link

Considering comments #124

Closed fkromer closed 8 years ago

fkromer commented 8 years ago

If comments would be provided in the AST pycparser could be used as frontend for a wider range of features implemented by code analyzers (e.g. code to comment ratio), diagram generators (e.g. comments as process indicators for flowcharts/tagged comments as process block indicators for flowgraphs), etc.

Syeberman commented 8 years ago

I agree this would be very useful (documentation generators also come to mind).

@eliben, if a patch were created, would this be a feature you'd consider merging in?

eliben commented 8 years ago

Yes, if the patch is reasonable. I don't think it will be easy though

rh314 commented 8 years ago

The real tricky part is that /* comment */ style comments can practically occur anywhere in the grammar :(

burnpanck commented 8 years ago

I am working on this, please refer to my fork. In fact, I do not intend to handle comments in the parser itself, but rather try to correlate comments stripped out by the preprocessor with the coords of the generated AST. The reason is, it is not at all clear to which node a particular comment should be attached, as there is no standard specifying anything about the comments. Rather, the consumer of the correlated AST and comments will have to decide how to use these comments: Typically, a comment applies to the next thing afterwards unless it is on the same line, but that might not be true in all cases.

In my fork, I try to provide character-exact coordinate ranges to each AST node. What I'm currently struggling with, is that for some AST nodes, it is not at all clear to me, what would be the proper range. Generally, I'd like to supply the range that encloses all tokens that are described by the AST node, such that the range for the variable declaration int a; includes both the type int and the identifier a (but probably not the semicolon). However, in a declaration like int a /* this is variable a*/,b;, two Decl nodes are produced, one for each of the two variables. In order to be able to associate the comment with the variable a, the Decl node for b should be limited to only the identifier. Unfortunately, the name attribute of Decl is not an AST node itself, so there is no space to store the range for just the name to. What are your thoughts on this?

fkromer commented 8 years ago

In my experience the way how comments may be used in the code is more or less "forced" by coding guides. Would it make sense to break down the problem scope in the first step by defining some option to consider no comments at all (current state of the functionality) or a specifc set of comment formats and comment content (e.g. no other C tokens which would complicate things, no "special" non-ANSI comment variants) only?

e.g.

// one line comment
<code>

/* one line comment */
<code>

/* multi
   line
   comment */
<code>

/* multi
 * line
 * comment
 */
<code>
eliben commented 8 years ago

@burnpanck for the reasons you stated, I really feel comments have no place in pycparser, unfortunately. They don't have a natural AST representation / location to attach them to, and pycparser is all about producing an AST without preserving original source integrity.

Thus I'm inclined to close this issue, since I don't think anything will come out of it for the mainline pycparesr.

burnpanck commented 8 years ago

@eliben I agree with you: I do not intend to do anything related to comments within pycparser, precisely because there is no natural AST representation. The only thing that I'm trying to do, is to record a more precise AST to source mapping. This would then allow users to use the AST as a way to interpret source in a structured fashion, including associating AST nodes with comments and other features that are not relevant on the syntax level. In that sense, I agree "considering comments" as an feature-request could be closed as out-of-scope. Maybe I should rename my feature branch to reflect that.