Destroyerrrocket / rustycpp

A C++ Compiler (on the works)
GNU General Public License v3.0
13 stars 0 forks source link

lalrpop? antlr? Why not nom crate? #1

Closed omac777 closed 1 year ago

omac777 commented 2 years ago

You have a brilliant idea! Congrats!

Is nom insufficient to do your preprocessor? Have you considered the nom crate for this?

Cheers

Destroyerrrocket commented 2 years ago

That's an excellent question! I am aware of the existence of nom, a parser combinator, and I have certainly considered it for the project. I'm certain that it is capable of parsing C++ (with the same aid that antlr or lalrpop would need). But given that the C++ standard already defines its language based on grammar rules akin to a traditional LL/LR parser, I'd be translating the LL(*) rules on top of nom constantly, while trying to find places where it could be significantly more practical. So, I decided to go for a model of parsing with which I am already familiar and I had less work (in theory).

Now, why do I depend on two completely independent parsers right now; originally I was going to use lalrpop, as I wanted to keep things completely in-language, and in theory, you should be able to parse C++ with an LR parser or a LL(*) parser (and, again, a lot of context dependent hackery). But in my first experimentation with macro parsing, lalrpop already showed that it would be hard to use, as it is not able to resolve recursive rules easily. So, I decided to seek alternative parsers and seeing that all the native ones had some compromises, I decided to opt for antlr, which uses java for code generation, but I had experience with it.

Please do note that none of this is set in stone. Both clang and gcc ended up opting for custom parsers, as C++ does not lend itself to easily being parsed by external generators. I might end up needing to write a custom parser to resolve edges of the language, in which case I'll reevaluate nom, as it might fit more in this new context.

Please do note that a GLR parser could parse C++, but they are notoriously slower, so I do not intend to opt for that for now.

Destroyerrrocket commented 1 year ago

(closing as resolved, as neither nom, antlr, nor lalrpop ended up being good for parsing C++ any more easily than hand coding it :/ )