J-F-Liu / pom

PEG parser combinators using operator overloading without macros.
MIT License
496 stars 30 forks source link

Support back-references with ">>" #49

Open jameskirkwood opened 2 years ago

jameskirkwood commented 2 years ago

Because seq produces a Parser that continues to borrow its tag, it's not possible to use the overloaded right shift operator (>>) with seq to create a back-reference to a previously parsed fragment.

As a basic example, the following will not compile because tag does not live long enough:

fn example() -> Parser<u8, Vec<u8>> {
    (sym(b'<') * none_of(b">").repeat(0..) - sym(b'>')) >> |tag| {
        (call(example) | none_of(b"<>").repeat(0..)) - seq(b"</") - seq(&tag) - sym(b'>')
    }
}

One solution is to modify seq so that it makes an internal copy of tag to be moved into the closure it generates. I tried this but I wasn't quite successful as I also changed the return type to Parser<'a, I, Vec<I>> and introduced a copy every time the sequence matched (only for the result to be immediately discarded).

Perhaps there is a way for seq to support both borrowing and owning its tag, or perhaps there is a good case for a new parser factory that matches against an owned tag?

Suggestions for alternatives are welcome.

J-F-Liu commented 2 years ago

A workaround is:

fn example<'a>() -> Parser<'a, u8, Vec<u8>> {
    (sym(b'<') * none_of(b">").repeat(0..) - sym(b'>'))
        >> |tag| {
            (call(example) | none_of(b"<>").repeat(0..))
                - seq(b"</") - take(tag.len()).convert(move |t| if t == tag { Ok(()) } else { Err(()) })
                - sym(b'>')
        }
}

You may else define a new owned version of seq.

jameskirkwood commented 2 years ago

I prefer your workaround as I don't need to use Parser::new, but for the record here is an owned version of seq:

fn seq_owned<'a, I>(tag: Vec<I>) -> Parser<'a, I, Vec<I>>
where
    I: PartialEq + Debug + Clone,
{
    Parser::new(move |input: &[I], start: usize| {
        let mut index = 0;
        loop {
            let pos = start + index;
            if index == tag.len() {
                return Ok((tag.to_owned(), pos));
            }
            if let Some(s) = input.get(pos) {
                if tag[index] != *s {
                    return Err(Error::Mismatch {
                        message: format!("seq {:?} expect: {:?}, found: {:?}", tag, tag[index], s),
                        position: pos,
                    });
                }
            } else {
                return Err(Error::Incomplete);
            }
            index += 1;
        }
    })
}
jameskirkwood commented 2 years ago

...And here is a much shorter owned version of seq that encapsulates your workaround, which could be a useful recipe:

fn seq_owned(tag: &[u8]) -> Parser<u8, ()> {
    let tag = tag.to_owned();
    take(tag.len()).convert(move |t| if t == tag { Ok(()) } else { Err(()) })
}