UpstandingHackers / hammer

Parser combinators for binary formats, in C. Yes, in C. What? Don't look at me like that.
GNU General Public License v2.0
430 stars 40 forks source link

Stack blown up using `h_indirect` and recursive pattern #169

Closed uucidl closed 7 years ago

uucidl commented 7 years ago

As I was exploring how to search for matches using a hammer parser I came up with the following:

HParser *search_parser(HParser *parser) {
  // NOTE(uucidl): this searcher pattern takes a parser and allows us to search for
  // it within the input stream
  auto searcher = h_indirect();
  h_bind_indirect(
      searcher,
      h_choice(parser, h_right(h_ch_range(0x01, 0xff), searcher), nullptr));
  return searcher;
}

It works for small inputs, however as soon as I threw a real world file at it, I blew up my stack in a series of: (tested on windows)

>   linter.exe!h_read_bits(HInputStream_ * state, int count, char signed_p) Line 29 C
    linter.exe!parse_token(void * env, HParseState_ * state) Line 12    C
    linter.exe!perform_lowlevel_parse(HParseState_ * state, const HParser_ * parser) Line 32    C
    linter.exe!h_do_parse(const HParser_ * parser, HParseState_ * state) Line 200   C
    linter.exe!parse_choice(void * env, HParseState_ * state) Line 30   C
    linter.exe!perform_lowlevel_parse(HParseState_ * state, const HParser_ * parser) Line 32    C
    linter.exe!h_do_parse(const HParser_ * parser, HParseState_ * state) Line 200   C
uucidl commented 7 years ago

It blew on a file of relative small size (13K)

pesco commented 7 years ago

Nicolas on Tue, Jul 26 2016:

  auto searcher = h_indirect();
  h_bind_indirect(
      searcher,
      h_choice(parser, h_right(h_ch_range(0x01, 0xff), searcher), nullptr));
  return searcher

This should work well for a context-free backend.

With packrat, this blows up because that backend is essentially recursive descent and cannot easily "forget about" a combinator it has entered, even if it does nothing more with the result. You would be asking for tail-recursion, basically.

What you can do instead:

    auto pre = h_right(h_and(h_not(parser)), h_ch_range(1,255));
    return h_sequence(h_many(pre), parser, 0);

This should run with constant stack overhead wrt. parser.

uucidl commented 7 years ago

Indeed it works! thanks

uucidl commented 7 years ago

On my dataset, where the match is rare, the parser version works measurably slower than calling h_parse repeatedly manually on every byte.