J-F-Liu / pom

PEG parser combinators using operator overloading without macros.
MIT License
496 stars 30 forks source link

Allow chaining parsers similar to `p + q + r` but without getting nested tuple result #25

Open JoshMcguigan opened 5 years ago

JoshMcguigan commented 5 years ago

Following up on the discussion in #24, I created one possible implementation which allows chaining parsers without getting a nested tuple result. You can see the result of this at the link below. Note that in this example, I could have removed the call to map entirely, but I wanted to leave it to demonstrate that the (hours, minutes, seconds) tuple is not longer nested.

https://github.com/J-F-Liu/pom/compare/master...JoshMcguigan:experimental-combinator

Unfortunately, at the moment I'm not sure how this could be extended to tuples of any size without creating all4, all5.. allN for some reasonable value of N. Another downside of this approach is when the user adds a parser to the chain they'd have to switch which version of the all function they are using, from allN to allN+1.

The upside to this is the result is a (not nested) tuple of the results of each of the parsers, which means this could nicely replace the use of the +, -, and * combinators, allowing users to write these types of combinations in what I'd consider to be more idiomatic Rust.

Thanks again for your work on this crate, and feel free to let me know if this isn't something you are interested in.

J-F-Liu commented 5 years ago

I think it's OK to define and use allN in user's code, but not good to include in pom, for the sake of consistent operator style. I also modified duration example a bit.

J-F-Liu commented 5 years ago
    (
        two_digits(),
        char(':'),
        two_digits(),
        char(':'),
        two_digits(),
        time_zone(),
    )
        .map(|(hour, _, minute, _, second, time_zone)| {
            // Its ok to just unwrap since we only parsed digits
            Time {
                hour: hour,
                minute: minute,
                second: second,
                time_zone: time_zone,
            }
        })

While this approach looks good.

JoshMcguigan commented 5 years ago

I think it's OK to define and use allN in user's code, but not good to include in pom, for the sake of consistent operator style. I also modified duration example a bit.

From my perspective, the reason I like using pom over the alternatives is the simplicity. The operator style for the combinators (p + q rather than something like all2(p, q) or p.and(q)) is the primary reason I have to keep the pom documentation open while developing. I think it would be more idiomatic, and friendlier to new-comers to pom, to use methods/functions rather than operator overloading.

That said, I do agree the code in your second post is nice. But that example is from combine, and it's not clear to me how an API like that could be developed within pom.

J-F-Liu commented 5 years ago

Yes, it would be fine to rename all3 to terms, and all4 to terms4. Or a macro terms! to handle any numbers of terms.

glasspangolin commented 4 years ago

Hi guys, not sure whether this is helpful but I solved this problem in my code using a new 'vector' combinator and an enum. I copied and adapted your code for the 'list' combinator to use an ordered Vec of parsers.

I have this in parsers.rs:

pub fn vector<'a, I, O>(
    parser: Vec<Parser<'a, I, O>>
) -> Parser<'a, I, Vec<O>>
    where
        O: 'a
{
    Parser::new(move |input: &'a [I] , start: usize| {
        let mut items = vec![];
        let mut pos = start;
        let mut done = false;
        let mut counter : usize = 0;
        while !done && counter < parser.len() {
            match (parser[counter].method)(input, pos) {
                Ok((more_item, more_pos)) => {
                    items.push(more_item);
                    pos = more_pos;
                    counter+=1;
                }
                Err(_) => {
                    done = true;
                    return Err(Error::Incomplete)
                },
            }
        }
        Ok((items, pos))
    })
}

Then a minimal working example:

use pom::parser::*;

#[derive(Copy, Clone, PartialEq, Debug)]
pub struct Object {
    a : Field,
    b : Field,
    c : Field
}

#[derive(Copy, Clone, PartialEq, Debug)]
enum Field {
    A,
    B,
    C,
    NONE
}

fn take_space<'a>() -> Parser<'a, u8, u8> {
    one_of(b" \t")
}

fn get_a<'a>() -> Parser<'a, u8, Field> {
    sym(b'a').map(|maybe_a| {
        if maybe_a == b'a' {
            Field::A
        } else {
            Field::NONE
        }
    })
}

fn get_b<'a>() -> Parser<'a, u8, Field> {
    sym(b'b').map(|maybe_a| {
        if maybe_a == b'b' {
            Field::B
        } else {
            Field::NONE
        }
    })
}

fn get_c<'a>() -> Parser<'a, u8, Field> {
    sym(b'c').map(|maybe_a| {
        if maybe_a == b'c' {
            Field::C
        } else {
            Field::NONE
        }
    })
}

pub fn parse_line<'a>() -> Parser<'a, u8, Object> {
    vector(
    vec![ call(get_a) - call(take_space).repeat(0..)
          , call(get_b) - call(take_space).repeat(0..)
          , call(get_c) - call(take_space).repeat(0..)]
    ).map(|v_vector| {
        Object {
            a:v_vector[0],
            b:v_vector[1],
            c:v_vector[2]
        }
    })
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_works() {
        assert_eq!(parse_line().parse(b"a b c").expect("couldn't parse."), Object {
            a:Field::A,
            b:Field::B,
            c:Field::C
        });
    }
}

You can see the limitation is that all the members of the vector have to return the same type, but I think it's quite neat when you combine the vector combinator with an enum :)