eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
421 stars 179 forks source link

Mysterious performance bottleneck #3406

Open joewiz opened 4 years ago

joewiz commented 4 years ago

Describe the issue

As reported on Slack, I developed a query that eXist runs 10x slower than BaseX. In order to aid the core developers in isolating the bottleneck, I reduced the more complex query to the essential part that demonstrates the same performance characteristics, as follows:

xquery version "3.1";

declare function local:compare($i1, $i2) {
    $i1 eq $i2
};

let $length := 100
let $seq := (1 to $length) ! map { "n": ., "values": ("a" || ., "b" || ., "c" || .) }
for $n1 in $seq, $n2 in $seq
let $alignments := for-each-pair($n1?values, $n2?values, local:compare#2)
return
    if (every $alignment in $alignments satisfies $alignment) then
        "yay"
    else
        ()

Increasing the $length variable causes eXist's execution time to increase sharply. Here are the performance times I recorded (in seconds):

$length eXist 4.7.1 eXist 5.2.0 eXist 5.3.0-SNAPSHOT PR #3363) BaseX 9.3.2
100 0.04 0.05 0.05 0.009
200 0.14 0.14 0.14 0.02
500 0.9 0.9 0.8 0.1
750 1.9 2 1.9 0.2
1000 3.4 3.7 3.3 0.4
1250 5.4 5.8 5.3 0.6
1500 7.9 8.3 7.6 0.8
1750 11.2 11.1 10.5 1.2
2000 14.5 15.3 13.5 1.4
3000 33.1 36 30.9 3.5

Graphing this, it became clear that eXist's performance degrades along a far sharper curve, while BaseX's performance declines more slowly:

Screen Shot 2020-05-08 at 6 18 14 PM

Note that, of the eXist builds here, the one that performed the best was 5.3.0-SNAPSHOT with PR #3363.

What factors are contributing to the performance bottleneck in eXist? Is there a way to overcome them?

Context (please always complete the following information):

Additional context

duncdrum commented 4 years ago

hmm i see the same performance curve as you. I tried to refactor the dual for clause to actually use two for keywords, since i remember that made a huge difference, in an older version, but this is no longer the case.

sooo ouch: 🐢

line-o commented 4 years ago

One possible optimisation baseX could use here is memoizing the function calls to local:compare or even for-each-pair.

The code below mimics this by first creating a map of all possible values. That way only a map-lookup is needed and comparing 100.000 items takes under two seconds in exist 4.7.1

xquery version "3.1";

declare function local:key-function ($items) { string-join($items, '') };

let $length := 100000
let $sequence-of-maps := (1 to $length) ! map { "n": ., "values": ("a" || ., "b" || ., "c" || .) }

let $compare-map :=
    for-each($sequence-of-maps, function ($item) {
        map { local:key-function($item?values) : true() }
    })
    => map:merge()

for $item in $sequence-of-maps
let $key := local:key-function($item?values)
let $alignments := map:contains($compare-map, $key)
return
    if ($alignments) 
    then ($item)
    else ()

I am confident your original problem could be rewritten in this way. But also we should investigate if baseX is in fact memoizing function calls or if this would, independently from what they do, help your and similar use cases.

line-o commented 4 years ago

The runtimes for an isolated test case of for-each-pair reveals no big performance gaps between exist 4.7.1 (1.9s) and baseX 9.3.1 (1.8s)

xquery version "3.1";

let $length := 200000
let $seq := (1 to $length)
return for-each-pair($seq, $seq, function ($a, $b) { $a = $b })
joewiz commented 4 years ago

@line-o Thanks very much for your analysis. Your insight about using a key to workaround the performance limitations reported here helped me find a faster solution for finding the candidates: grouping by the key.

xquery version "3.1";

let $length := 1000

let $sequence-of-maps := (1 to $length) ! map { "n": ., "values": ("a" || util:random(100), "b" || util:random(100), "c" || util:random(100)) }

let $keyed-sequence := 
    for-each($sequence-of-maps, function ($item) {
        map:put($item, "key", string-join($item?values))
    })

for $items in $keyed-sequence
group by $key := $items?key
where count($items) gt 1
return
    array { $items }

(As you might notice, the revised query uses util:random() to generate random values to populate the sequence of maps. This clarifies an unstated assumption in my original query: that the purpose of the query is to discover sets of 2+ items in the sequence with identical values.)

The resulting query performs this task much faster, and I think it is adequate for my purposes.

That said, I should note that the performance is correspondingly faster in BaseX too. As a result, we still see a similar shape when comparing the performance of the two processors (this shows the average of 5 runs at different values of the $length parameter):

Screen Shot 2020-05-12 at 1 24 29 AM

To get the query to run in BaseX, instead of substituting random:integer() for util:random(), I adapted the query to use fn:random-number-generator(), which is cross-platform, performed much better in eXist (but requires 5.3.0-SNAPSHOT which includes @adamretter's great fix from https://github.com/eXist-db/exist/pull/3072), and avoided heap space errors that appeared in eXist with $length set to 100,000. That said, with $length set to 100,000, eXist finished in 306s, whereas BaseX finished in .96s.

Here are the queries I used to profile eXist and BaseX's performance: profile-xquery-exist-basex.zip.

In sum, the difference in performance of both the original query reported in the first post and the modified ones here suggest that there eXist still has some performance and scalability potential with this type of query.

line-o commented 4 years ago

@joewiz That is very insightful research! I think it is evident that our implementation of fn:for-each-pair has nothing to do with the performance issue. I would suggest to rename the issue to reflect the new data you delivered.

... and we need to take a look at the FLWOR implementation. Would you have time and energy to compare execution times with a "gluten-free" version? One that uses HOFs to iterate over the dataset instead.

joewiz commented 4 years ago

@line-o Good point. I've renamed the issue to the best of my understanding. And going "gluten-free" is an interesting suggestion. To clarify, you're suggesting that I replace any FLWOR-based iterations with HOF, right? Is that because you suspect the problem is with eXist's FLWOR implementation?

line-o commented 4 years ago

Well, I just remembered I did some testing on this. It could be even slower. And then I tested it... Oh my, try it yourself @joewiz

xquery version "3.1";

import module namespace xbow="http://line-o.de/xq/xbow";

let $length := 10000

let $sequence-of-maps := (1 to $length) ! map { "n": ., "values": ("a" || util:random(100), "b" || util:random(100), "c" || util:random(100)) }

let $keyed-sequence := 
    for-each($sequence-of-maps, function ($item) {
        map:put($item, "key", string-join($item?values))
    })

return xbow:groupBy($keyed-sequence, xbow:pluck("key"))
    => map:for-each(function ($k, $v) {
        if (count($v) > 1) then array{$v} else ()
    })
joewiz commented 4 years ago

@line-o Compared to the equivalent query in my post from yesterday, this one is considerably faster:

length FLWOR + group by xbow:groupBy + map:for-each
1000 .062 .052
2000 .236 .08
10000 6.1 .4
20000 * .8
30000 ** 1.1
50000 ** 1.3
100000 ** 4.3

* CPU pegged, never finishes, monex won't load, have to force-quit eXist

** Didn't even try

This comparison suggests a clear win for xbow's HOF approach, whereas eXist's standard group by completely crumbles past length=10000.

(This "crumbling" phenomenon alone deserves our attention - can we develop some method to kill runaway queries? Can we agree that we should never have to force-quit eXist?)

As I probed the results of my "test runner" that performs 5 runs at many different "lengths" and produces the data used in the charts above, I realized that the step of generating the items itself was taking up a considerable portion of the time for each test run. Since generating dummy data is not the core concern in this issue, I decided to adjust my test to read from pre-generated array of 100,000 items in a static file, and perform the grouping operation on subsets of this. This way, we can be confident that the results let us see how each processor and grouping algorithm performs. Here are the results:

Screen Shot 2020-05-13 at 11 54 56 AM

Questions:

  1. For the eXist results, why does xbow perform so much faster in the "simple" scenario reported at the top of this post than in the "test runner" scenario shown in the chart here? For example, in the "simple" scenario, xbow is The only difference is that the test runner breaks up the functionality into several functions and averages the duration of each run.
  2. (That's all I can think of!)

Here are the queries, including both the "simple" scenario and the "test runner" scenario: profile-xquery-exist-basex-v2.zip

p.s. If it helps, here are the results from monex's profiling of the "test runner" scenarios:

group by

Screen Shot 2020-05-13 at 12 18 41 PM

xbow

Screen Shot 2020-05-13 at 12 42 05 PM

line-o commented 4 years ago

@joewiz I did some optimisations that only times the operation itself. Both approaches are part of testbed.xq . Set up code (loading data and keying it) are not measured as part of the run. The results better resemble, what I concluded in my earlier tests a year ago: runtime of HOFs compared to FLWOR are roughly 2:1

duncdrum commented 4 years ago

@line-o That gist is not the query you are looking for ⚡

line-o commented 4 years ago

@duncdrum the link should be fixed now.

line-o commented 4 years ago

What I find very interesting is that the speed a xquery script finishes can change quite dramatically, even when doing only minor changes to the overall setup. You really have to aim for the sweet spot of the optimiser.

joewiz commented 4 years ago

Before anyone reads this, you might want to take a seat! The difference between the results I reported in my last post and what's below blew me away.

Here is the updated chart using @line-o's "testbed" approach:

Screen Shot 2020-05-13 at 11 59 13 PM

Observations about the test runner:

  1. Both eXist and BaseX benefited tremendously from the "testbed" approach. eXist's performance improved by an average of ~60x and BaseX by ~6x. In other words, the test runner itself was responsible for a considerable portion of the apparent difference between the two processors. They are both clearly capable at handling the operation initially chosen to demonstrate the unusually slow phenomenon described in the original post.
  2. The performance difference between eXist and BaseX shrank dramatically. It is now more a 3-4x difference than a 10-40x+ difference. In other words, the two processors perform this test comfortably within the same order of magnitude.
  3. The shape of this curve suggests eXist is now advancing linearly, not exponentially, with the size of the dataset.

Observations about "flwor" vs. "hof":

  1. On one hand, the "FLWOR"-based query performs, on average, 4.5x faster than the "HOF"-based query for eXist. (3.5x for BaseX.)
  2. On the other hand, HOF-based insights were responsible for the spectacular improvements in the performance of the test runner.
  3. These are contradictory points, so I must be misunderstanding something. The first point suggests that plain old FLWOR is highly optimized and better than HOF, but the second point suggests that HOF can be dramatically faster than FLWOR. Thus, I'm still not sure what lesson to draw is.

Implications:

  1. The "testbed" test runner is clearly now the best tool we have to carry out XQuery performance tests. I imagine we'll try to make it easier to "plug in" new queries and generate useful results for comparison.
  2. We identified queries that triggered severe bottlenecks/limitations in eXist. While we found workarounds, speaking for myself, I still do not understand the nature of the bottlenecks.
    • Clearly the "keyed" ("group by") approach was better than the "for-each-pair" approach, and this can probably be generalized. Does it come down to this advice? Or was there some other factor at play?
    • What lessons do we take from the "FLWOR" vs. "HOF" discussion above? Is there an inherent reason why the "FLWOR" version will always beat out "HOF" (if that's the takeaway), or does this point to an opportunity to unify eXist's iteration classes (as @duncdrum suggested) and thereby lift all boats with the rising tide of optimizations to these classes?
    • Why was the original test runner so slow in eXist? It was using really standard XQuery—nothing fancy, just simple FLWOR loops and user-defined functions.
    • Since BaseX also benefits from the new test runner by ~6x, there are clearly insights that XQuery developers from both implementations can learn from. Such lessons could be written up as conference papers or blog posts and hopefully crystallized into a few new items in the tuning article). On the other hand, eXist improved by 60x. Simple arithmetic would suggest, therefore, that still-mysterious bottlenecks in eXist are responsible for the remaining 10x of poor performance of the initial test runner. What are these bottlenecks, and can we root them out from eXist so that everyone's existing code can run better?
  3. A side issue, but important to highlight again: Can we give dba users a swift, safe, and reliable mechanism to kill queries, so users are never put in the situation of having to force-quit eXist (and thus trigger a recovery run, etc.)? Are certain expressions or functions not working with the watchdog? Queries in this issue were all in-memory and not writing to the database. They should be eminently haltable. (If the topic of "killing queries" in general raises worries about database inconsistency, could we learn from the XQuery Update spec and provide rollback protections for updating functions and functions with the %updating annotation?
  4. Has this issue completed its mission to shine a light on the "bottlenecks" in eXist, or do we need to keep developing other tests to shine more light on various theories? If the latter, what educated guesses do the core developers have about areas that need to be probed with rigorous performance tests?
duncdrum commented 4 years ago

test-runner

Flowr vs hof

Implications

line-o commented 4 years ago

@joewiz Since I have a very light laptop, where I performed my tests I would now focus on threading. How many (real) cores your existdb is able to use? Background: you report 6:1 whereas I see more 2(2,5):1 runtime ratios.

What is really unclear to me: Why your average runtime for 100K items with HOFs is over 600ms but it is ~490ms on my machine.

joewiz commented 4 years ago

@duncdrum Thanks for your feedback on the big picture questions!

@line-o I'll send you my machine info via DM.

adamretter commented 4 years ago

Can we give dba users a swift, safe, and reliable mechanism to kill queries, so users are never put in the situation of having to force-quit eXist (and thus trigger a recovery run, etc.)?

Not easily... The way that queries are implemented in eXist-db means that they themselves have to check whether a request has been made to terminate them. They only do this between operations, they can't easily do it when they are in a tight loop of processing. The way that queries are executed and managed would need some serious re-design to get it working like you have described.

Are certain expressions or functions not working with the watchdog?

Likely, the developer has to remember to insert calls to check whether the query should be terminated. It's likely there are not as many of these checks as you would like.

Queries in this issue were all in-memory and not writing to the database. They should be eminently haltable. (If the topic of "killing queries" in general raises worries about database inconsistency, could we learn from the XQuery Update spec and provide rollback protections for updating functions and functions with the %updating annotation?

Stopping in-memory queries should be fine. Stopping queries that are writing to the db is very dangerous in eXist-db as there is no real transaction abort/roll-back protection.

adamretter commented 4 years ago

I think this issue is very interesting, but it probably needs to be broken into lots of separate issues. The problem is that re-writing queries and changing approaches is fine as a work around, but it just moves the target when trying to find the performance issue. I think this issue likely shows lots of different performance issues.