charlesvdv / nom-bibtex

A feature complete bibtex parser using nom
https://docs.rs/nom-bibtex
MIT License
22 stars 15 forks source link

Replace recursion by loop to parse huge files #27

Closed bbencina closed 2 weeks ago

bbencina commented 3 months ago

When trying to parse huge bibtex files (e.g. one of the cryptobib files) recursion in the entries parser at src/parser.rs:368:51 exhausts the stack and causes an overflow. Try using e.g. backtrace_on_stack_overflow to verify this.

Example code:

extern crate nom_bibtex;

use std::fs;
use nom_bibtex::Bibtex;

fn main() {
    unsafe {
        backtrace_on_stack_overflow::enable()
    };
    let filename: &str = "crypto.bib";
    let buf = fs::read_to_string(filename).expect("Couldn't open file");
    let bib = Bibtex::parse(buf.as_str()).unwrap(); // <-- HERE
    println!("{:?}", bib); // or something
}

Using a loop instead resolves this issue (see diff). This also removes the need for the O(n) insert operation because the entries are parsed in the original order instead of reversed, so push can be used.

A6GibKm commented 3 months ago

Hello,

Would it be possible to add a test case here?

bbencina commented 3 months ago

Absolutely, I can add a fixed copy of crypto.bib to samples/. Should I just test the parsing goes through?

bbencina commented 3 months ago

I also added the option to have bracketed strings in concatenation, so the crypto.bib file can be parsed without any fixes to it. The bibtex binary also accepts this syntax as valid.

Switched test file to original crypto.bib that can be found here.