Help and example project

mingodad commented 4 years ago

Hello !

Looking through the documentation and examples I didn't got a good picture to how use "colm/ragel" in a practical project.

I have this project doing what seems to be a good fit for "colm" https://github.com/mingodad/ljs/tree/master/lua2ljs it actually uses re2c and sqlite3/lemon to rewrite LUA code to LJS code (mainly a javascript like syntax), actually it do it correctly on most codebases, requiring minimal hand adjusts on a few cases.

Ideally I would like to have it both ways LUA->LJS and LJS->LUA if someone with more experience with "colm" could help implement at least LUA->LJS in colm I would try make the LJS->LUA after that, I think that it could be a good complete useful example and would attract more users to "colm".

Cheers !

adrian-thurston commented 4 years ago

Yes it does sound like a great project. Where does sqlite3 come in?

phorward commented 4 years ago

hi @adriant, I think he means the Lemon parser generator which is part of the sqlite project. Lemon is a yacc-style LALR(1) parser generator with zero dependencies, as far as I know. I'm appreciating such a project, too.

mingodad commented 4 years ago

Hello !

Sorry by not explaining properly in the first place !

But Jan (phorward) got it.

Cheers !

adrian-thurston commented 4 years ago

Oh haha, I should have made that connection!

The first step is to make a grammar for Lua. You should be able to use the lua grammar mostly as is. The exception would be the expression part. Colm doesn't have working left/right associative declarations. There needs to be some work on that ... making it correct in the context of backtracking has always been a challenge. Possibly I wasn't seeing the solution clearly, or it's actually hard. In any case it needs some work and you can't currently use it.

mingodad commented 4 years ago

Hello Adrian !

Thank you for your answer, and based on it now I can guess why "colm" do not have a wider usage.

By the way you are aware of http://codeworker.free.fr/ it seems to try do the same as "colm" but seems abandoned.

Cheers !

phorward commented 4 years ago

Thank you for your answer, and based on it now I can guess why "colm" do not have a wider usage.

Please note, that Colm's advantage is that it's both generalized and has a very tight and fast implementation due to the backtracking LR algorithm. The disadvantage is that you partly have to rewrite your grammar to fit into these needs.

mingodad commented 4 years ago

Hello Jan ! Thank you for reply !

Knowing the problematic parts of the general programming language grammars like the "expressions" would be possible to have those grammar blocks as a library that could be reused and have the names (non terminals, terminals) aliased to match the end user grammar ?

The libmarpa project (https://github.com/jeffreykegler/libmarpa) also try to solve this problem but fail to attract users I suspect due to the same problems "colm" has.

Both project have already several years but when we try to use then we get a feeling that they are not finished and ready for end users.

Cheers !

phorward commented 4 years ago

Hey @mingodad,

surely, many languages share the same expressional syntax and could be generalized as some kind of library. Anyway, every language has its specialities, and operator precedence behaviors.

Python, for example, has a ** b as operator for power, with a precedence between a primary value (a, a.b for example) and the * operator, but C or Rust implement a function pow() for that. Therefore, you cannot generalize this for all languages out there, except basic mathematic expressions, and you need some possibility to "override" grammar elements entirely, to "plug-in" your custom syntax.

Again, it will be a lot of work to do. But good examples are always welcome. Why not built this Lua grammar and serve it as useful sample to adapt other grammars for Colm from it?

mingodad commented 4 years ago

Hello Jan !

That's why I proposed it in the first place.

But as I stated before my knowledge of "colm" doesn't seem enough, I'm providing an already working example and was willing to compare it with a "colm" version to see the difference in terms of lines/complexity of code.

And LUA because it has a small and simple syntax.

Cheers !

adrian-thurston commented 4 years ago

Hi @mingodad, with regards to general purpose library for parsing expressions, I agree with @phorward, there are too many subtle differences between languages that it doesn't make sense to try and create one library to parse them all.

Note that the Lua language can be handled easily by expanding the expression grammar into multiple levels.

expr -> expr = term
expr -> term

term -> term + factor
term -> factor

factor -> factor * primary
factor -> primary

primary -> id
primary -> ( expression )

It's true that using one level combined with precedence declarations is convenient, not only because it reduces the number of productions, but also because it can produce a homogeneous tree type.

For these reasons I'd like to support it in colm. I have started actually, but it doesn't currently work. The precedence tokens effectively decide which shift/reduce action to take when there are multiple possibilities, whereas the colm engine already has an algorithm covering this. It orders them and tries them all, via backtracking. So precedence tokens somehow need to override the colm algorithm, without disrupting the backtracking. There are some issues around it that I don't understand and that's what needs to be figured out.

adrian-thurston / colm

Help and example project #105