J-F-Liu / pom

PEG parser combinators using operator overloading without macros.
MIT License
500 stars 31 forks source link

Consider return a slice from `Parser::collect` #2

Closed zummenix closed 7 years ago

zummenix commented 7 years ago

It would reduce memory allocations in places like:

number.collect().map(|v|String::from_utf8(v).unwrap()).map(|s|f64::from_str(&s).unwrap())
J-F-Liu commented 7 years ago

I tried this, but when impl<'a> Input<char> for TextInput<'a>:

fn segment(&self, start: usize, end: usize) -> &'a [char] {
    let chars = self.text[start..end].chars().collect::<Vec<char>>();
    &chars
}

There is compile error: ^^^^^ does not live long enough

zummenix commented 7 years ago

Is TextInput that much important? As I see internally you're still using chars.

J-F-Liu commented 7 years ago

With TextInput chars are read on the flow. Otherwise would use DataInput::new(char_vec.as_slice()). This can be solved by adding an associated type parameter.

/// Parser input is generic over terminal type, which is usually u8 or char.
pub trait Input<'a, T> where T: Copy {
    type Segment;
    /// Get current position.
    fn position(&self) -> usize;

    /// Peek current symbol.
    fn current(&self) -> Option<T>;

    /// Advance to next symbol.
    fn advance(&mut self);

    /// Jump to specified position.
    fn jump_to(&mut self, position: usize);

    /// Get a segment from the input.
    fn segment(&self, start: usize, end: usize) -> Self::Segment;
}

/// Wrap &[u8] or &[char] as input to parser.
pub struct DataInput<'a, T: 'a> {
    pub data: &'a [T],
    pub position: usize,
}

impl<'a, T: Copy> DataInput<'a, T> {
    pub fn new(input: &'a [T]) -> DataInput<T> {
        DataInput {
            data: input,
            position: 0,
        }
    }
}

impl<'a, T: Copy> Input<'a, T> for DataInput<'a, T> {
    type Segment = &'a [T];

    fn position(&self) -> usize {
        self.position
    }

    fn current(&self) -> Option<T>
    {
        if self.position < self.data.len() {
            Some(self.data[self.position])
        } else {
            None
        }
    }

    fn advance(&mut self) {
        self.position += 1;
    }

    fn jump_to(&mut self, position: usize) {
        self.position = position;
    }

    fn segment(&self, start: usize, end: usize) -> &'a [T] {
        &self.data[start..end]
    }
}

/// Wrap &str as input to parser.
pub struct TextInput<'a> {
    pub text: &'a str,
    pub position: usize,
}

impl<'a> TextInput<'a> {
    pub fn new(input: &'a str) -> TextInput<'a> {
        TextInput {
            text: input,
            position: 0,
        }
    }
}

impl<'a> Input<'a, char> for TextInput<'a> {
    type Segment = &'a str;

    fn position(&self) -> usize {
        self.position
    }

    fn current(&self) -> Option<char>
    {
        self.text[self.position..].chars().next()
    }

    fn advance(&mut self) {
        if let Some(c) = self.text[self.position..].chars().next() {
            self.position += c.len_utf8();
        }
    }

    fn jump_to(&mut self, position: usize) {
        self.position = position;
    }

    fn segment(&self, start: usize, end: usize) -> &'a str {
        &self.text[start..end]
    }
}

But the parser code cannot compile:

pub struct Parser<'a, I, O> {
    method: Box<Fn(&mut Input<'a, I>) -> Result<O> + 'a>,
}
method: Box<Fn(&mut Input<'a, I>) -> Result<O> + 'a>,
                    ^^^^^^^^^^^^ missing associated type `Segment` value

Strange error.

zummenix commented 7 years ago

Yeah, I was moving in that direction at first, then I decided to drop TextInput and benchmark DataInput with segment returning &'a [T] but lifetime errors stopped me.

J-F-Liu commented 7 years ago

This issue is solved in 2.x branch.