Open schoettl opened 3 years ago
Cool that you're doing benchmarks! And sorry to hear that org-parser
is that slow.
The parser that @branch14 was referring to is Ragel: https://en.wikipedia.org/wiki/Ragel
As discussed, Ragel is pretty nice, so the following is nothing against Ragel.
Having said that, the bottleneck for the long running test very likely isn't Java. Depending on the program, Java is within an order of magnitude of C or even faster, because it can optimize longer running processes on the spot. The TL;DR for speed is: Java programs are often slower to start (not so good for quick commandline tools, for example), but fast or faster for longer running apps. Therefore, there's likely an algorithmic issue going on (like we recently had).
Ah, you've updated and are saying there's an OutOfMemoryError
. That confirms the suspicion of an algorithmic error further(;
If it's not obvious what the problem is (I wouldn't know atm, for example), my next step would be to debug this with https://visualvm.github.io/ to see what's happening while it's running. It'll show what functions are incurring time and memory.
Another benchmark:
@alexpdp7 wrote instaparse-cli which compiles to a binary. I just did a benchmark:
$ wc -l ~/org/notes.org
1883
$ time ./instaparse-cli ~/projects/org-parser/resources/org.ebnf ~/org/notes.org > /dev/null
real 0m4.173s
user 0m3.838s
sys 0m0.312s
$ time ./target/org-parser.js ~/org/notes.org >/dev/null
real 0m11.243s
user 0m17.132s
sys 0m0.670s
I wanted to see if a GraalVM binary makes the difference. But it's not a fair comparison so far because a) the JS runtime startup time is unknown and b) our parser also makes the transformation. I didn't tried a clojure script that parses without transformation. I would be something like:
(ns hello-world.core
(:require [org-parser.parser :refer [org]])
(:require [org-parser.core :refer [read-str write-str]]))
(org (read-file-to-string "~/org/notes.org")
(Don't know how to run that.)
I wanted to try and reproduce the issue described by taking a real world 15k Org file. I took my personal gtd and reference files and the 200ok gtd file - together they are 16k LOC:
munen@lambda:~/src/200ok/org-parser% cat ~/Dropbox/org/things.org ~/Dropbox/org/reference.org ~/src/200ok/200ok-admin/THINGS.org> real_world.org
munen@lambda:~/src/200ok/org-parser% wc -l real_world.org
16802 real_world.org
Running this with Java:
munen@lambda:~/src/200ok/org-parser% time java -jar target/uberjar/org-parser-0.1.24-SNAPSHOT-standalone.jar real_world.org >& /dev/null
java -jar target/uberjar/org-parser-0.1.24-SNAPSHOT-standalone.jar >& 85.84s user 10.65s system 533% cpu 18.086 total
Granted, 18s is not fast in any stretch of the imagination - and it used > 4 cores in parallel, so on a single core it would have taken 85s. 85s is super slow and unacceptable to use for something like organice, of course. However, it doesn't crash and burn.
Running it with JS shows a similar result:
munen@lambda:~/src/200ok/org-parser% time ./target/org-parser.js real_world.org >& /dev/null
./target/org-parser.js real_world.org >&/dev/null 129.84s user 1.67s system 167% cpu 1:18.64 total
JS does not run as much code in parallel, hence resulting in a longer real world compute time. Also inacceptable for organice, but also does not crash and burn.
So, my first take-away is:
Can you produce an anonymized version of your Org file so that It can try and reproduce the issue with that?
Some more benchmarks in the REPL:
(time
(->
(slurp "test/org_parser/fixtures/minimal.org")
parser/parse))
Running this multiple times yields:
Or in other words, the actual time required to parse the minimal example is just fine.
Taking my 16k LOC real world file:
(time
(def a
(-> (slurp "real_world.org")
(parser/parse))))
NB: Using a tmp def
statement so that Emacs doesn't try to print the result, because it's huge(;
In multiple runs, this yields an elapsed time of:
20s is certainly too long just for parsing.
Now, compare that with just the transformation step for the 16k LOC file:
(time (def b (transform/transform a)))
This yields:
Making it clear where the bottleneck is for my example: The parsing step.
@schoettl The instaparse-cli benchmark you did in https://github.com/200ok-ch/org-parser/issues/56#issuecomment-869036553, does this use the same Org file as the benchmark that does OOM with org-parser? If I understand it right, instaparse-cli requires 'just' 4s and org-parser runs into OOM?
If so, then this behaviour is again different than for my test. Because that would indicate that just parsing is relatively cheap on your file, but transforming it is too costly.
To nail this down further, it would be good to have a common test file. And maybe you could run the parser and and transformer separately in the repl like I did in https://github.com/200ok-ch/org-parser/issues/56#issuecomment-880551895. I'm happy to help with this step, if it's unclear how to do, of course!
How about this sample org file:
sample.org: README.org ~/projects/organice/README.org ~/projects/organice/sample.org
cat $^ > $@
It's only
$ wc -l sample.org
1638 sample.org
lines. But it uses a lot of org features we want to test.
I tried to anonymize an org file with sed -r -e 's/[a-zA-Z]/x/g' -e 's/[0-9]/9/g'
but this doesn't work with drawers, begin/end blocks, and timestamps...
@munen Would you compare these results to yours:
org-parser.core> (time (def a (-> (slurp "sample.org") (parser/parse))))
"Elapsed time: 3034.876874 msecs"
#'org-parser.core/a
org-parser.core> (time (def b (transform/transform a)))
"Elapsed time: 26.040887 msecs"
#'org-parser.core/b
EDIT: using latest master of both organice and org-parser.
(time (def a (-> (slurp "../organice/sample.org") (parser/parse))))
yields a result one order of maginude faster for me:
"Elapsed time: 342.571364 msecs"
"Elapsed time: 351.378139 msecs"
"Elapsed time: 373.980784 msecs"
"Elapsed time: 341.999213 msecs"
"Elapsed time: 348.235206 msecs"
"Elapsed time: 341.401392 msecs"
"Elapsed time: 330.863657 msecs"
"Elapsed time: 346.982437 msecs"
"Elapsed time: 349.778513 msecs"
(time (def b (transform/transform a)))
yields very similar results as for you. Here's multiple consecutive runs:
"Elapsed time: 10.988978 msecs"
"Elapsed time: 20.594534 msecs"
"Elapsed time: 22.839173 msecs"
"Elapsed time: 7.661142 msecs"
"Elapsed time: 21.075701 msecs"
"Elapsed time: 14.261132 msecs"
"Elapsed time: 11.514961 msecs"
340ms for parsing isn't spectacular, but wouldn't be the end of the world. However, for you it's 10x the amount of time. That's already a dealbreaker. I can't imagine my machine being 10x faster than yours, but I ran this on a Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
CPU just FYI.
If I cat
sample.org 10 times over into the same file, I also get a run with about 4s length for the parsing step:
munen@lambda:~/src/200ok/organice% for i in `seq 10`; do cat sample.org >> sample2.org; done
Now we have a sample file on which we can do performance analysis.
A first rough sampling on what's going on yields this:
I'm afraid that some more digging didn't yield anything nicely actionable, yet. It looks like it requires more digging into instaparse to find the bottleneck.
@branch14 Imo it would be best to continue in a pairing session here.
One option to tackle this is to do binary search in the EBNF grammar.
Preliminary finding: Parsing of the text
rule is costly. Reducing it from all options down to just text-normal
speeds up parsing of sample.org
by a factor of 6.
Some notes on that:
/
syntax in instaparse means that all options will be tried serially. Hence, the fastest options should be tested first.Hm, almost anything is text
. Is that rule really costly or does it just take most of the processing time?
text = ( timestamp / link-format / footnote-link / text-link / text-target / text-radio-target / text-entity / text-macro / text-styled / text-sub / text-sup / text-linebreak / text-normal )*
The order doesn't seem bad. E.g. timestamp
should return very fast if the first character is not regex [[<]
.
If text-styled
hurts, it maybe can be removed because its function is incomplete anyway (#12 ).
On the two-line sample file, org-parser takes
lein run
which is probably mainly compilation and runtime initialization.On a 31,000 line real-world org file, it takes a maximum of CPU, 2GB of memory and at least 15 minutes (not yet finished).
(
lein run
crashed after 15 minutes withOutOfMemoryError
.)I haven't tried the JS version yet but I guess on a Smartphone it will only work well for files with a few hundred lines.
I'm a bit surprised because our parse trees should all be pretty flat.
@branch14 how was this language-independent state-machine parser generator called again? I guess a compiled binary compiler would be significantly faster.