DDlog parser needs a rewrite using a faster parser combinator library

netj commented 8 years ago

I first thought it was a JVM booting latency issue and optimized that a bit, but didn't help much (1-2 seconds saving after first launch). I just did a dumb profiling, and it shows the parser is unreasonably slow, taking more than 99% of the time (10.70s/10.75s and 11.37s/11.48s), and unfortunately done twice from deepdive's compiler.

2016-01-22 04:45:29.792846 Parsing DeepDive application (/afs/cs.stanford.edu/u/netj/lfs1/dd-genomics) to generate:
2016-01-22 04:45:29.792864  run/compiled/schema.json
2016-01-22 04:45:29.792882   from app.ddlog
2016-01-22 04:45:29.792895 parsing
2016-01-22 04:45:40.497900 parsed
2016-01-22 04:45:40.540621 handled
2016-01-22 04:45:40.541898  run/compiled/deepdive.conf
2016-01-22 04:45:40.542026   from app.ddlog
2016-01-22 04:45:40.560282 parsing
2016-01-22 04:45:51.934837 parsed
2016-01-22 04:45:52.046044 handled

The ddlog code is not ridiculously large, ~1411 lines including comments:

$ wc app.ddlog 
 1411  4710 55034 app.ddlog

According to FastParse, another parser combinator library, maybe that's an expected speed for Scala's parser combinator we're currently using.

Up to 1/5 the speed of a hand-written parser, 100x faster than scala-parser-combinators, comparable (though slightly slower than) Parboiled2

So, it's definitely the right time to rewrite/migrate the DDlog parser with a new library. With 100x speedup, most things should be done under a second, and my first JVM optimization will become worthwhile.

netj commented 8 years ago

@feiranwang Do you have free cycles to take a crack at this asap? @HazyResearch/genomics guys are heavily affected by this.

chrismre commented 8 years ago

yikes... this is insane. We need to fix this asap...

Colossus commented 8 years ago

Sry might this be because of a bloated run/ directory? One thing I noticed is that on "old" dd directories, the run can take ages, but if I clone the thing and compile, it's fast enough for normal operation by far

Colossus commented 8 years ago

Whoever is interesting in profiling should go on raiders7 and copy one of my dd-genomics directories /lfs/raiders7/0/jbirgmei/dd-genomics2 and try around there

feiranwang commented 8 years ago

Looking into this. Parsing this file takes ~8s locally. I'll check out FastParse.

Colossus commented 8 years ago

I'm almost sure this is not an issue with the parser itself. There is no way to write an ANTLR grammar that translates to something that slow

netj commented 8 years ago

The combinators don't produce the more traditional LR or LALR parser but just give a recursive descendent parser, hence the slowness is very plausible.

feiranwang commented 8 years ago

I rewrote the parser using parboiled2, and tried on the genomics. It takes about ~400ms to parse the app.ddlog, compared to ~8s using scala parser combinator.

netj commented 8 years ago

@feiranwang You can probably also plug a bit about what happened when the input was repeated 10x?

feiranwang commented 8 years ago

When the input was repeated for 10 times, the new parser finished in ~1s, but the scala parser combinator took more than 5min...

HazyResearch / ddlog

DDlog parser needs a rewrite using a faster parser combinator library #71