antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

cpp grammar fails to parse `enum` declarations in Qt header file #446

Open ctrlcctrlv opened 8 years ago

ctrlcctrlv commented 8 years ago

Hey! I wrote an in depth post about this issue here: https://gist.github.com/ctrlcctrlv/fbe5c2b36e444d9f00f1aaad19d7f6ba

Basically, if you don't have time to read all of that, the cpp grammar is resulting in very complex trees that I have no idea how to even begin parsing.

I know that I am not supposed to use the Lisp-like output to parse, I'm just using it to get an idea of what my tree looks like so I can write a listener. But...

(pmexpression (castexpression (unaryexpression (postfixexpression (primaryexpression (idexpression (unqualifiedid ImageConversionFlag))) ) enum BGMode { TransparentMode))))))))))))))))) , (assignmentexpression (conditionalexpression (logicalorexpression (logicalandexpression (inclusiveorexpression (exclusiveorexpression (andexpression (equalityexpression (relationalexpression (shiftexpression (additiveexpression (multiplicativeexpression (pmexpression (castexpression (unaryexpression (postfixexpression (primaryexpression (idexpression (unqualifiedid OpaqueMode))) } ; enum Key { Key_Escape = 0x01000000))))))))))))))

I get very weird output like this. enum is just floating as an argument to...equalityexpression, I think. This is unparseable.

As you read it, I don't want you to think that I'm belittling you or this project. I really wrote my Gist "blog" as just stream of consciousness while I was working.

And please, let me know if I'm doing something wrong. :)

KvanTTT commented 8 years ago
  1. What file do you try to parse? I tried to parse your qnamespace.h file from gist and discovered too many errors there. cpp grammar should be improved for this file.
  2. Deep expressions are a common practice in ANTLR output. You should use Visitor or Listener pattern for such tree traversal. With Listener you can process tree nodes with needed types.
ctrlcctrlv commented 8 years ago

What file do you try to parse? I tried to parse your qnamespace.h file from gist and discovered too many errors there. cpp grammar should be improved for this file.

Yes, that's the input file. It's valid c++ despite coming from Qt source code - I ran qmake on it beforehand to be sure and there were no macros to expand.

ctrlcctrlv commented 8 years ago

Oh and @KvanTTT , in regards to your second point, you and someone else pointed out to me that ANTLR is the wrong tool for the job, so I went with using CastXML instead to create a tree from source code.

However, this grammar is still producing buggy output for a valid CPP file, so I will leave the issue open even if I decided to go another way for my specific project as I think it's worth looking into.

Thanks :-) :+1:

KvanTTT commented 8 years ago

I think ANTLR also can be used for your goals. I mean using of Visitor and Listener over the ANTLR ast.

Marti2203 commented 4 years ago

Hi, I know this is a couple of years late. The grammars for C and C++ are deeply nested and do not look pleasant. As @KvanTTT use Listeners to make your job easier. I do not know whether the C++ grammar works properly and honestly I would not like to touch them.