aurzenligl / prophy

prophy: fast serialization protocol
MIT License
13 stars 8 forks source link

Prophy schema parser needs to parse comments. #21

Closed kamichal closed 5 years ago

kamichal commented 5 years ago

In order to implement babel feature - schema parser needs to collect comments. These have to be written in the same format as schema generator's output (to be seen in test_schema.py). I.e. Struct, Union and Enum gets a doc string block and separatelly each of its members gets either single line comment (after member def, inline) or a block comment, (before the member definition). Also typedef and constant could get comments in these both forms.

It's the last objective to get bi-direcional compilation from schema to prophyc.model and vice versa. Schema is supposed to become a babel's vault language. Oh yes, it has to be as epic as the names :D

Unfortunately I failed trying to implement that. 'ply' won this painfull battle... I managed LEX tokens but YACC was blowing up with each little change I made. It's a ultra fragile piece of software. Or maybe I'm an elephant in a china-shop. Can somebody help?

aurzenligl commented 5 years ago

I have similar feelings about ply. It's not ridiculously bad (yocto's bitbake uses it), but requires very good understanding of lex/yacc which I didn't learn by reading https://www.dabeaz.com/ply/ply.html. It's also ridiculously hard to debug. I wasn't able (or didn't have intestinal fortitude) to implement a more potent diagnostics - the error message doesn't really deliver the goods.

aurzenligl@aurzenligl-pc /tmp $ cat x.prophy 
struct X {
    u32 x; f
    u64 y;
};
aurzenligl@aurzenligl-pc /tmp $ prophyc --cpp_out . x.prophy 
prophyc: error: x.prophy:3:5: error: syntax error at 'u64'

Another way to tackle parsing problem would be to use antlr (if it's not too cumbersome wrt its dependencies, I never looked into this parsing framework). Its popularity puts ply to shame and afaik it can produce Python parsers (but I don't know details, e.g. performance): https://github.com/antlr/antlr4/blob/master/doc/python-target.md

Or write own recursive-descent parser. The magnificent fully manual option which gcc, clang, protoc and countless other parsing applications use. It gives you full control and extensibility in every direction. I tried to solve a certain problem some time ago and implemented such parser: https://github.com/aurzenligl/melina tokenizer: https://github.com/aurzenligl/melina/blob/3fbd5257f17a15ac9c95bc1a6317b27fa967fc4e/melina.py#L487-L586 parser: https://github.com/aurzenligl/melina/blob/3fbd5257f17a15ac9c95bc1a6317b27fa967fc4e/melina.py#L588-L1013 And it actually worked pretty well! Parser is implemented as if each method was a grammar production, using other productions. Tokenizer absolutely must use re module as it's implemented in C - tokenizer implemented in Python (with if statements and loops) is so unbearably slow that it's not worth it. Btw. ply also uses re. Diagnostics are rather easy to produce: https://github.com/aurzenligl/melina/blob/3fbd5257f17a15ac9c95bc1a6317b27fa967fc4e/tests/test_metaparser.py#L22-L40 https://github.com/aurzenligl/melina/tree/master/tests/data/meta_errors and it's ludicrously fun to implement ;)

kamichal commented 5 years ago

It's decided to not add comments parsing in prophy schema. Issue can be closed.