Closed fkromer closed 8 years ago
I agree this would be very useful (documentation generators also come to mind).
@eliben, if a patch were created, would this be a feature you'd consider merging in?
Yes, if the patch is reasonable. I don't think it will be easy though
The real tricky part is that /* comment */
style comments can practically occur anywhere in the grammar :(
I am working on this, please refer to my fork. In fact, I do not intend to handle comments in the parser itself, but rather try to correlate comments stripped out by the preprocessor with the coords
of the generated AST. The reason is, it is not at all clear to which node a particular comment should be attached, as there is no standard specifying anything about the comments. Rather, the consumer of the correlated AST and comments will have to decide how to use these comments: Typically, a comment applies to the next thing afterwards unless it is on the same line, but that might not be true in all cases.
In my fork, I try to provide character-exact coordinate ranges to each AST node. What I'm currently struggling with, is that for some AST nodes, it is not at all clear to me, what would be the proper range. Generally, I'd like to supply the range that encloses all tokens that are described by the AST node, such that the range for the variable declaration int a;
includes both the type int
and the identifier a
(but probably not the semicolon). However, in a declaration like int a /* this is variable a*/,b;
, two Decl
nodes are produced, one for each of the two variables. In order to be able to associate the comment with the variable a
, the Decl
node for b
should be limited to only the identifier. Unfortunately, the name
attribute of Decl
is not an AST node itself, so there is no space to store the range for just the name to. What are your thoughts on this?
In my experience the way how comments may be used in the code is more or less "forced" by coding guides. Would it make sense to break down the problem scope in the first step by defining some option to consider no comments at all (current state of the functionality) or a specifc set of comment formats and comment content (e.g. no other C tokens which would complicate things, no "special" non-ANSI comment variants) only?
e.g.
// one line comment
<code>
/* one line comment */
<code>
/* multi
line
comment */
<code>
/* multi
* line
* comment
*/
<code>
@burnpanck for the reasons you stated, I really feel comments have no place in pycparser, unfortunately. They don't have a natural AST representation / location to attach them to, and pycparser is all about producing an AST without preserving original source integrity.
Thus I'm inclined to close this issue, since I don't think anything will come out of it for the mainline pycparesr.
@eliben I agree with you: I do not intend to do anything related to comments within pycparser, precisely because there is no natural AST representation. The only thing that I'm trying to do, is to record a more precise AST to source mapping. This would then allow users to use the AST as a way to interpret source in a structured fashion, including associating AST nodes with comments and other features that are not relevant on the syntax level. In that sense, I agree "considering comments" as an feature-request could be closed as out-of-scope. Maybe I should rename my feature branch to reflect that.
If comments would be provided in the AST pycparser could be used as frontend for a wider range of features implemented by code analyzers (e.g. code to comment ratio), diagram generators (e.g. comments as process indicators for flowcharts/tagged comments as process block indicators for flowgraphs), etc.