Performance problems - Githubissues

schoettl commented 3 years ago

On the two-line sample file, org-parser takes

3 seconds with a compiled jar
7.5 seconds with lein run which is probably mainly compilation and runtime initialization.

On a 31,000 line real-world org file, it takes a maximum of CPU, 2GB of memory and at least 15 minutes (not yet finished).

(lein run crashed after 15 minutes with OutOfMemoryError.)

I haven't tried the JS version yet but I guess on a Smartphone it will only work well for files with a few hundred lines.

I'm a bit surprised because our parse trees should all be pretty flat.

@branch14 how was this language-independent state-machine parser generator called again? I guess a compiled binary compiler would be significantly faster.

munen commented 3 years ago

Cool that you're doing benchmarks! And sorry to hear that org-parser is that slow.

The parser that @branch14 was referring to is Ragel: https://en.wikipedia.org/wiki/Ragel

As discussed, Ragel is pretty nice, so the following is nothing against Ragel.

Having said that, the bottleneck for the long running test very likely isn't Java. Depending on the program, Java is within an order of magnitude of C or even faster, because it can optimize longer running processes on the spot. The TL;DR for speed is: Java programs are often slower to start (not so good for quick commandline tools, for example), but fast or faster for longer running apps. Therefore, there's likely an algorithmic issue going on (like we recently had).

munen commented 3 years ago

Ah, you've updated and are saying there's an OutOfMemoryError. That confirms the suspicion of an algorithmic error further(;

munen commented 3 years ago

If it's not obvious what the problem is (I wouldn't know atm, for example), my next step would be to debug this with https://visualvm.github.io/ to see what's happening while it's running. It'll show what functions are incurring time and memory.

schoettl commented 3 years ago

Another benchmark:

@alexpdp7 wrote instaparse-cli which compiles to a binary. I just did a benchmark:

$ wc -l ~/org/notes.org
1883
$ time ./instaparse-cli ~/projects/org-parser/resources/org.ebnf ~/org/notes.org > /dev/null
real    0m4.173s
user    0m3.838s
sys 0m0.312s
$ time ./target/org-parser.js ~/org/notes.org >/dev/null
real    0m11.243s
user    0m17.132s
sys 0m0.670s

I wanted to see if a GraalVM binary makes the difference. But it's not a fair comparison so far because a) the JS runtime startup time is unknown and b) our parser also makes the transformation. I didn't tried a clojure script that parses without transformation. I would be something like:

   (ns hello-world.core
     (:require [org-parser.parser :refer [org]])
     (:require [org-parser.core :refer [read-str write-str]]))

   (org (read-file-to-string "~/org/notes.org")

(Don't know how to run that.)

munen commented 2 years ago

I wanted to try and reproduce the issue described by taking a real world 15k Org file. I took my personal gtd and reference files and the 200ok gtd file - together they are 16k LOC:

munen@lambda:~/src/200ok/org-parser% cat ~/Dropbox/org/things.org ~/Dropbox/org/reference.org ~/src/200ok/200ok-admin/THINGS.org> real_world.org
munen@lambda:~/src/200ok/org-parser% wc -l real_world.org
16802 real_world.org

Running this with Java:

munen@lambda:~/src/200ok/org-parser% time java -jar target/uberjar/org-parser-0.1.24-SNAPSHOT-standalone.jar real_world.org >& /dev/null
java -jar target/uberjar/org-parser-0.1.24-SNAPSHOT-standalone.jar  >&  85.84s user 10.65s system 533% cpu 18.086 total

Granted, 18s is not fast in any stretch of the imagination - and it used > 4 cores in parallel, so on a single core it would have taken 85s. 85s is super slow and unacceptable to use for something like organice, of course. However, it doesn't crash and burn.

Running it with JS shows a similar result:

munen@lambda:~/src/200ok/org-parser% time ./target/org-parser.js real_world.org >& /dev/null
./target/org-parser.js real_world.org >&/dev/null  129.84s user 1.67s system 167% cpu 1:18.64 total

JS does not run as much code in parallel, hence resulting in a longer real world compute time. Also inacceptable for organice, but also does not crash and burn.

So, my first take-away is:

org-parser seems to be taking at least linear more time for longer test cases. Of course, O(n) makes sense, but not when it's seconds per 1k LOC.
organice opens the same 16k LOC file instantly. So for use in organice, this certainly would have to be optimized.
Since your test and my test differ in orders of magnitude, there must be a systemic issue. My first hunch: Some elements are more expensive to parse currently than others and your file might make very heavy use of them.

Can you produce an anonymized version of your Org file so that It can try and reproduce the issue with that?

munen commented 2 years ago

Some more benchmarks in the REPL:

(time
    (-> 
     (slurp "test/org_parser/fixtures/minimal.org")
     parser/parse))

Running this multiple times yields:

"Elapsed time: 1.610755 msecs"
"Elapsed time: 3.918499 msecs"
"Elapsed time: 0.825926 msecs"
"Elapsed time: 1.332137 msecs"
"Elapsed time: 1.016321 msecs"
"Elapsed time: 2.664771 msecs"
"Elapsed time: 1.554072 msecs"

Or in other words, the actual time required to parse the minimal example is just fine.

Taking my 16k LOC real world file:

(time
 (def a
   (-> (slurp "real_world.org")
       (parser/parse))))

NB: Using a tmp def statement so that Emacs doesn't try to print the result, because it's huge(;

In multiple runs, this yields an elapsed time of:

"Elapsed time: 21256.961734 msecs"
"Elapsed time: 21420.782183 msecs"

20s is certainly too long just for parsing.

Now, compare that with just the transformation step for the 16k LOC file:

(time (def b (transform/transform a)))

This yields:

"Elapsed time: 197.770233 msecs"
"Elapsed time: 183.680781 msecs"
"Elapsed time: 179.003511 msecs"
"Elapsed time: 182.85855 msecs"
"Elapsed time: 193.891148 msecs"
"Elapsed time: 185.456713 msecs"

Making it clear where the bottleneck is for my example: The parsing step.

munen commented 2 years ago

@schoettl The instaparse-cli benchmark you did in https://github.com/200ok-ch/org-parser/issues/56#issuecomment-869036553, does this use the same Org file as the benchmark that does OOM with org-parser? If I understand it right, instaparse-cli requires 'just' 4s and org-parser runs into OOM?

If so, then this behaviour is again different than for my test. Because that would indicate that just parsing is relatively cheap on your file, but transforming it is too costly.

To nail this down further, it would be good to have a common test file. And maybe you could run the parser and and transformer separately in the repl like I did in https://github.com/200ok-ch/org-parser/issues/56#issuecomment-880551895. I'm happy to help with this step, if it's unclear how to do, of course!

schoettl commented 2 years ago

How about this sample org file:

sample.org: README.org ~/projects/organice/README.org ~/projects/organice/sample.org
    cat $^ > $@

It's only

$ wc -l sample.org
1638 sample.org

lines. But it uses a lot of org features we want to test.

I tried to anonymize an org file with sed -r -e 's/[a-zA-Z]/x/g' -e 's/[0-9]/9/g' but this doesn't work with drawers, begin/end blocks, and timestamps...

@munen Would you compare these results to yours:

org-parser.core> (time (def a (-> (slurp "sample.org") (parser/parse))))
"Elapsed time: 3034.876874 msecs"
#'org-parser.core/a
org-parser.core> (time (def b (transform/transform a)))
"Elapsed time: 26.040887 msecs"
#'org-parser.core/b

EDIT: using latest master of both organice and org-parser.

munen commented 2 years ago

(time (def a (-> (slurp "../organice/sample.org") (parser/parse)))) yields a result one order of maginude faster for me:

"Elapsed time: 342.571364 msecs"
"Elapsed time: 351.378139 msecs"
"Elapsed time: 373.980784 msecs"
"Elapsed time: 341.999213 msecs"
"Elapsed time: 348.235206 msecs"
"Elapsed time: 341.401392 msecs"
"Elapsed time: 330.863657 msecs"
"Elapsed time: 346.982437 msecs"
"Elapsed time: 349.778513 msecs"

(time (def b (transform/transform a))) yields very similar results as for you. Here's multiple consecutive runs:

"Elapsed time: 10.988978 msecs"
"Elapsed time: 20.594534 msecs"
"Elapsed time: 22.839173 msecs"
"Elapsed time: 7.661142 msecs"
"Elapsed time: 21.075701 msecs"
"Elapsed time: 14.261132 msecs"
"Elapsed time: 11.514961 msecs"

340ms for parsing isn't spectacular, but wouldn't be the end of the world. However, for you it's 10x the amount of time. That's already a dealbreaker. I can't imagine my machine being 10x faster than yours, but I ran this on a Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz CPU just FYI.

If I cat sample.org 10 times over into the same file, I also get a run with about 4s length for the parsing step:

munen@lambda:~/src/200ok/organice% for i in `seq 10`; do cat sample.org >> sample2.org; done

Now we have a sample file on which we can do performance analysis.

munen commented 2 years ago

A first rough sampling on what's going on yields this:

munen commented 2 years ago

I'm afraid that some more digging didn't yield anything nicely actionable, yet. It looks like it requires more digging into instaparse to find the bottleneck.

munen commented 2 years ago

@branch14 Imo it would be best to continue in a pairing session here.

munen commented 2 years ago

One option to tackle this is to do binary search in the EBNF grammar.

munen commented 2 years ago

Preliminary finding: Parsing of the text rule is costly. Reducing it from all options down to just text-normal speeds up parsing of sample.org by a factor of 6.

Some notes on that:

The / syntax in instaparse means that all options will be tried serially. Hence, the fastest options should be tested first.
If there are options that are very slow, they could be rewritten.

schoettl commented 2 years ago

Hm, almost anything is text. Is that rule really costly or does it just take most of the processing time?

text = ( timestamp / link-format / footnote-link / text-link / text-target / text-radio-target / text-entity / text-macro / text-styled / text-sub / text-sup / text-linebreak / text-normal )*

The order doesn't seem bad. E.g. timestamp should return very fast if the first character is not regex [[<].

If text-styled hurts, it maybe can be removed because its function is incomplete anyway (#12 ).

200ok-ch / org-parser

Performance problems #56