Marwes / combine

A parser combinator library for Rust
https://docs.rs/combine/*/combine/
MIT License
1.29k stars 93 forks source link

Some issue with error reporting #332

Open tailhook opened 2 years ago

tailhook commented 2 years ago

I've stripped down my example to the following grammar, expressed in English: source text can contain multiple items separated by newline or comment (double slash //), each item is identifier followed by whitespace-separated numbers.

Here are tree versions of a grammar:

use combine::parser::char::{digit, space, letter};
use combine::parser::repeat::{repeat_until};
use combine::{Stream, Parser, EasyParser};
use combine::{eof, token, many1, sep_by, value};
use combine::{many, skip_many1, attempt};

fn id<I: Stream<Token=char>>() -> impl Parser<I, Output=String> {
    many(letter())
}

fn ws<I: Stream<Token=char>>() -> impl Parser<I, Output=()> {
    skip_many1(space())
}

fn num<I: Stream<Token=char>>() -> impl Parser<I, Output=String> {
    many1(digit())
}

fn comment<I: Stream<Token=char>>() -> impl Parser<I, Output=()> {
    attempt((token('/'), token('/')).silent()).with(value(()))
}

fn newline<I: Stream<Token=char>>() -> impl Parser<I, Output=()> {
    token('\n').with(value(())).expected("newline")
}

fn main() {

    let mut parser1 = many::<Vec<_>, _, _>(
        id()
        .and(many::<Vec<_>, _, _>(ws().with(num())))
        .and(comment().or(newline())),
    );

    let mut parser2 = many::<Vec<_>, _, _>(
        id()
        .and(repeat_until::<Vec<_>, _, _, _>(
            ws().with(num()),
            comment().or(newline()),
        ))
        .and(comment().or(newline())),
    );

    let mut parser3 = many::<Vec<_>, _, _>(
        id()
        .skip(ws())
        .and(sep_by::<Vec<_>, _, _, _>(num(), ws()))
        .and(comment().or(newline()))
    );

    let s = r#"a 123/2"#;
    let err1 = parser1.easy_parse(s)
         .map_err(|e| e.map_position(|p| p.translate_position(s)))
         .unwrap_err();
    let err2 = parser2.easy_parse(s)
         .map_err(|e| e.map_position(|p| p.translate_position(s)))
         .unwrap_err();
    let err3 = parser3.easy_parse(s)
         .map_err(|e| e.map_position(|p| p.translate_position(s)))
         .unwrap_err();
    println!("{}\n{}\n{}", err1, err2, err3);
}

The output is:

Parse error at 6
Unexpected `2`
Unexpected `/`
Expected `whitespace`, `digit` or `newline`

Parse error at 5
Unexpected ` `
Expected `letter`

Parse error at 6
Unexpected `2`
Unexpected `/`
Expected `whitespace` or `newline`

Note in variant 1:

  1. Two unexpected's, / is at wrong position, 2 is not the erroneous character. Looks like a bug?
  2. Position is the position of the character after the erroneous one
  3. Expected digit is wrong. There needs to be whitespace between (or newline, or comment which is silenced)

Note in variant 2:

  1. Unexpected space is at a different position.
  2. Erroneous position is (surprisingly) right
  3. letter can't be here, note that even if I remove the outermost many (i.e. only support single item, so there are no letters possible after initial whitespace), this parser also reports letter.

Note in variant 3:

  1. Same issues as with "variant 1" for position and "unexpected"s
  2. "expected" set is fine

Are there any bugs, or am I misunderstanding parsers somehow? Also why there is such a difference between sep_by, repeat_until and many?