Marwes / combine

A parser combinator library for Rust
https://docs.rs/combine/*/combine/
MIT License
1.29k stars 93 forks source link

Add loop_parser and loop_gen #318

Open uzytkownik opened 3 years ago

uzytkownik commented 3 years ago

I run into trouble using combine when writing a parser which used internal state to parse next element and accumulate them together. There are several ways to achieve it but each has some drawbacks:

Use then or otherwise use right-recursion

enum Token { A, B, C }

fn combine_state((tok, mut state): (Token, HashMap<Token, usize>))
{
    *state.entry(token).or_insert(0) += 1;
    state
}

parser!{
    fn tokens_parser_<Input>[Input](last_token: Option<Token>)(Input) -> HashMap<Token, usize>
    where
         Input: Stream<Token = char>
    {
         let mut choices = vec![];
         if last_token != Some(Token::A) {
             choices.push(token('a').map(|_| Token::A)).left());
         }
         if last_token != Some(Token::B) {
             choices.push(token('b').map(|_| Token::B).left().right());
        }
        if last_token != Some(Token::C) {
             choices.push(token('c').map(|_| Token::C).right().right());
        }
        choices((
            choice(choices).then_ref(|tok| token_parser(Some(tok))).map(combine_state),
            value(HashMap::new())
         ))
    }
}

let mut tokens_parser = tokens_parser_(None);

This however has a drawback of potentially being expensive if the combine is expensive (e.g. pushing to the front of vector). It's possible to avoid by temporarily using data structures such as list but it is additional structures you need to convert to/from.

Write custom function using parser and managed the state yourself

enum Token { A, B, C }

fn token_parser<Input>(last_token: Option<Token>)(Input) -> impl Parser<Input, Output = Option<Token>>
where
     Input: Stream<Token = char>
{
    let mut choices = vec![];
    if last_token != Some(Token::A) {
        choices.push(token('a').map(|_| Token::A)).left());
    }
    if last_token != Some(Token::B) {
        choices.push(token('b').map(|_| Token::B).left().right());
    }
    if last_token != Some(Token::C) {
         choices.push(token('c').map(|_| Token::C).right().right());
    }
    optional(choice(choices))
}

let mut tokens_parser = parser(|input| {
    let mut state = HashMap::new();
    let (first_token, commit) = token_parser(None).parse_stream(input).into_result()?;
    let mut last_token = match first_token {
        Some(first_token) => first_token,
        None => return Ok((state, commit))
    };
    loop {
        *state.entry(token).or_insert(0) += 1;
        match token_parser(Some(last_token)).into_result() {
            Ok((current_token, commit2)) => {
                commit = commit.merge(commit2);
                last_token = current_token;
            },
            Err(Commit(e)) =>
                return Err(Commit(e)),
            Err(Peek(e)) =>
                return state
        }
    }
});

This makes code-flow a bit clearer and avoids the ordering issue but it is probably a bit error-prone due to need of managing commitment yourself. It also require full reparse for partial state

Write custom parser

Most of the methods needed to implement the custom parsers are internal.

uzytkownik commented 3 years ago

I will take a look at failure tomorrow. For whatever reason it passed locally.