craff / pacomb

A parsing library that compiles grammars to combinators using elimination of left recursion
MIT License
16 stars 2 forks source link

Resolve ambiguities locally? #23

Open mlemerre opened 6 months ago

mlemerre commented 6 months ago

Hi,

I just ported a whole parser using pacomb and it is very nice. One thing that I don't understand well is how to deal with ambiguity. Specifically, the merge function is attached to the %parser declaration, so as I understand it, the whole file is parsed, and then ambiguities are resolved; but this can already be made using parse_all_buffer. What I would prefer to do is to resolve ambiguites locally, i.e. being able to attach merge function to rules, but I could not find how to do that. The only thing I found is partial_parse_buffer, but I don't think using directly would work correctly.

Is there a way to solve this problem? Maybe by adding a and%parser rule suffices?

craff commented 6 months ago

mlemerre @.***> writes:

Hi,

I just ported a whole parser using pacomb and it is very nice.

Thanks!

I will profit of your remarks and this discussion to ameliorate the documentation.

One thing that I don't understand well is how to deal with ambiguity. Specifically, the merge function is attached to the %parser declaration, so as I understand it, the whole file is parsed, and then ambiguities are resolved;

A representation of the grammar is build at runtime, translated to parser combinators that are used for parsing. Ambiguïties are dealt with only at parsing and locally, if you wish.

I cite the toy example of the documentation:

let%parser @.** (+.)] rec bin_seq = () => 1.0 ; (t1::bin_seq) 'a' (t2::bin_seq) => t1 . t2

This grammar rule is highly ambiguous, and the @.*** indication] indicates how to deal with two parsing semantics for the same portion of the parsed string. Here addition is not really something you will do in practice, but gives a nice function computing Catalan numbers ;-)

You could more realistically have a "Or" constructor if the result of parsing is an abstract syntax tree and your grammar is really ambiguous (typically for parsing natural languages and to treat ambiguities locally to avoid an explosion of the size of the parse tree)

let%parser @.*** (fun x y -> Or(x,y))] ....

If you think your grammar in semantically unambiguous you could use

let%parser @.*** (fun x y -> if eq x y then x else assert false)] ...

where eq is the equality you wish.

This is also a good choice to debug ambiguities if you think your grammar should not be ambiguous at all. May be it should be the default for pacomb, but it is hard to chose an suitable equality as OCaml polymorphic equality does not work on closure.

but this can already be made using parse_all_buffer.

The doc says this is not really a good idea, you will have rapidely an exponential number of parse trees.

What I would prefer to do is to resolve ambiguites locally, i.e. being able to attach merge function to rules, but I could not find how to do that.

It was there: https://raffalli.eu/pacomb/pacomb/index.html#ambiguous

The only thing I found is partial_parse_buffer, but I don't think using directly would work correctly.

No this is only usefull if you do not want to parse the whole buffer input. This has little to do with ambiguity.

Thanks again for you interest, Christophe

-- Christophe Raffalli tél: +689 87 23 11 48 web: http://raffalli.eu my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF

mlemerre commented 6 months ago

Thanks for your answer! I think my previous question was not very clear. If I only have one ambiguous recursive rule, the documentation makes it clear how to resolve the ambiguity:

let%parser [@merge f] rec f = ...

Now, if I have mutually recursive rules, I would normally write:

let%parser rec f = ...

and g = ...

But how can I now attach different merge functions to f and g? I think I can only attach a merge function to f.

mlemerre commented 6 months ago

(By the way, the link in the github readme to the full documentation is broken)

craff commented 6 months ago

Hello

Thanks for your answer! I think my previous question was not very clear. If I only have one ambiguous recursive rule, the documentation makes it clear how to resolve the ambiguity:

let%parser @.*** f] rec f = ... Now, if I have mutually recursive rules, I would normally write:

let%parser rec f = ...

and g = ...

let%parser @.*** merge_f] rec f = ...

and @.*** merge_g] g = ...

should work. I use that with @.***. Did you try that ?

Cheers, Christophe

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

-- Christophe Raffalli tél: +689 87 23 11 48 web: http://raffalli.eu my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF

craff commented 6 months ago

mlemerre @.***> writes:

(By the way, the link in the github readme to the full documentation is broken)

Thanks, I fixed it.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

-- Christophe Raffalli tél: +689 87 23 11 48 web: http://raffalli.eu my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF

mlemerre commented 6 months ago

let%parser @. merge_f] rec f = ... and @. merge_g] g = ... should work. I use that with @.***. Did you try that ? Cheers, Christophe

I assume that you wrote

and [@merge merge_g] g =

(github removes everything that looks like a mail addresse). Actually I tried:

and [@merge merge_expr] expr [@merge merge_expr] =

which does not work. Only let %parser [@merge merge_f] works. If I underestood correctly, the merge function would then be attached to the rule f? In that case, I could reorder my grammer (but I would still have a problem if I want to attach two merge functions to two different rules).

mlemerre commented 6 months ago

Actually, it is working. I must have been doing something wrong, I am very sorry.

mlemerre commented 6 months ago

It seems that raising Give_up in the merge function discards one of the parsetrees. I wonder if this is expected behaviour it seems hard to predict which parse tree will be discarded.

(I am using merge function to detect and report parsing ambiguities; failwith works here, but Give_up seemed convenient to report the location of the problem).

craff commented 6 months ago

Hello

Yes the semantics of give_up is to reject parse trees. If you do it in merge it is a bit strange. A nice solution would be using effect from 5.0. otherwise decorate you parse tree with positions.

I could actually have the default ambiguity message report positions. I probably only have access to the right position of the ambiguity, but it would be something better. To have the beginning of the ambiguous region your code must record it. Positions are not preserved by default because people that wants ultra fast parser would complain...

Cheers Christophe

Le 1 février 2024 05:15:43 GMT-10:00, mlemerre @.***> a écrit :

It seems that raising Give_up in the merge function discards one of the parsetrees. I wonder if this is expected behaviour it seems hard to predict which parse tree will be discarded.

(I am using merge function to detect and report parsing ambiguities; failwith works here, but Give_up seemed convenient to report the location of the problem).

-- Reply to this email directly or view it on GitHub: https://github.com/craff/pacomb/issues/23#issuecomment-1921563814 You are receiving this because you commented.

Message ID: @.***>

craff commented 6 months ago

mlemerre @.***> writes:

It seems that raising Give_up in the merge function discards one of the parsetrees. I wonder if this is expected behaviour it seems hard to predict which parse tree will be discarded.

Yes. Maybe an error in this case would be nice. Or at least write something in the documentation.

(I am using merge function to detect and report parsing ambiguities; failwith works here, but Give_up seemed convenient to report the location of the problem).

I just added a @.***_with_pos] and much better for you

let _ = Grammar.set_debug_merge Format.std_formatter

You shoud call this before the compilation of the grammar.

This in on github ...

Using merge is expensive, so you do not want to do it when it is unnecessary Therefore I find the above to be the best ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

-- Christophe Raffalli tél: +689 87 23 11 48 web: http://raffalli.eu my mails should pass DKIM/SPF tests/mes messages doivent passer les tests DKIM/SPF

mlemerre commented 6 months ago

The merge_with_pos would be perfect from my use case, where I use merge to display the possible parse trees (to ask the user to disambiguate). But you eventually removed it? I understand that computing pos is expensive, but I would have thought that passing around spos would not be...

For context: my use case is to use case to detect ambiguities. The grammar describes complex types meant to be used by various people, and I believe that resolving all ambiguities in the grammar by giving arbitrary priorities to all constructs is bad. I think it is safer to only have the obvious priorities, and to warn the user when there are multiple parse trees for its file, asking him to parenthesize. I have just caught some errors in our tests using this design.

This is why I needed a parser with support for ambiguous parsing, and pacomb worked very nicely. It is a great package which deserves to be more well known!