Open netj opened 8 years ago
@feiranwang Do you have free cycles to take a crack at this asap? @HazyResearch/genomics guys are heavily affected by this.
yikes... this is insane. We need to fix this asap...
Sry might this be because of a bloated run/ directory? One thing I noticed is that on "old" dd directories, the run can take ages, but if I clone the thing and compile, it's fast enough for normal operation by far
Whoever is interesting in profiling should go on raiders7 and copy one of my dd-genomics directories /lfs/raiders7/0/jbirgmei/dd-genomics2 and try around there
Looking into this. Parsing this file takes ~8s locally. I'll check out FastParse.
I'm almost sure this is not an issue with the parser itself. There is no way to write an ANTLR grammar that translates to something that slow
The combinators don't produce the more traditional LR or LALR parser but just give a recursive descendent parser, hence the slowness is very plausible.
I rewrote the parser using parboiled2, and tried on the genomics. It takes about ~400ms to parse the app.ddlog, compared to ~8s using scala parser combinator.
@feiranwang You can probably also plug a bit about what happened when the input was repeated 10x?
When the input was repeated for 10 times, the new parser finished in ~1s, but the scala parser combinator took more than 5min...
I first thought it was a JVM booting latency issue and optimized that a bit, but didn't help much (1-2 seconds saving after first launch). I just did a dumb profiling, and it shows the parser is unreasonably slow, taking more than 99% of the time (10.70s/10.75s and 11.37s/11.48s), and unfortunately done twice from deepdive's compiler.
The ddlog code is not ridiculously large, ~1411 lines including comments:
According to FastParse, another parser combinator library, maybe that's an expected speed for Scala's parser combinator we're currently using.
So, it's definitely the right time to rewrite/migrate the DDlog parser with a new library. With 100x speedup, most things should be done under a second, and my first JVM optimization will become worthwhile.