Mysterious performance bottleneck

joewiz commented 4 years ago

Describe the issue

As reported on Slack, I developed a query that eXist runs 10x slower than BaseX. In order to aid the core developers in isolating the bottleneck, I reduced the more complex query to the essential part that demonstrates the same performance characteristics, as follows:

xquery version "3.1";

declare function local:compare($i1, $i2) {
    $i1 eq $i2
};

let $length := 100
let $seq := (1 to $length) ! map { "n": ., "values": ("a" || ., "b" || ., "c" || .) }
for $n1 in $seq, $n2 in $seq
let $alignments := for-each-pair($n1?values, $n2?values, local:compare#2)
return
    if (every $alignment in $alignments satisfies $alignment) then
        "yay"
    else
        ()

Increasing the $length variable causes eXist's execution time to increase sharply. Here are the performance times I recorded (in seconds):

`$length`	eXist 4.7.1	eXist 5.2.0	eXist 5.3.0-SNAPSHOT PR #3363)	BaseX 9.3.2
100	0.04	0.05	0.05	0.009
200	0.14	0.14	0.14	0.02
500	0.9	0.9	0.8	0.1
750	1.9	2	1.9	0.2
1000	3.4	3.7	3.3	0.4
1250	5.4	5.8	5.3	0.6
1500	7.9	8.3	7.6	0.8
1750	11.2	11.1	10.5	1.2
2000	14.5	15.3	13.5	1.4
3000	33.1	36	30.9	3.5

Graphing this, it became clear that eXist's performance degrades along a far sharper curve, while BaseX's performance declines more slowly:

Note that, of the eXist builds here, the one that performed the best was 5.3.0-SNAPSHOT with PR #3363.

What factors are contributing to the performance bottleneck in eXist? Is there a way to overcome them?

Context (please always complete the following information):

OS: macOS 10.13.6
eXist-db version: 4.7.1, 5.2.0, and 5.3.0-SNAPSHOT
BaseX version: 9.3.2
Java Version: 1.8.0_242 (Zulu 8.44.0.11-CA-macosx) (build 1.8.0_242-b20)

Additional context

How is eXist-db installed? DMG
Any custom changes in e.g. conf.xml? None

duncdrum commented 4 years ago

hmm i see the same performance curve as you. I tried to refactor the dual for clause to actually use two for keywords, since i remember that made a huge difference, in an older version, but this is no longer the case.

sooo ouch: 🐢

line-o commented 4 years ago

One possible optimisation baseX could use here is memoizing the function calls to local:compare or even for-each-pair.

The code below mimics this by first creating a map of all possible values. That way only a map-lookup is needed and comparing 100.000 items takes under two seconds in exist 4.7.1

xquery version "3.1";

declare function local:key-function ($items) { string-join($items, '') };

let $length := 100000
let $sequence-of-maps := (1 to $length) ! map { "n": ., "values": ("a" || ., "b" || ., "c" || .) }

let $compare-map :=
    for-each($sequence-of-maps, function ($item) {
        map { local:key-function($item?values) : true() }
    })
    => map:merge()

for $item in $sequence-of-maps
let $key := local:key-function($item?values)
let $alignments := map:contains($compare-map, $key)
return
    if ($alignments) 
    then ($item)
    else ()

I am confident your original problem could be rewritten in this way. But also we should investigate if baseX is in fact memoizing function calls or if this would, independently from what they do, help your and similar use cases.

line-o commented 4 years ago

The runtimes for an isolated test case of for-each-pair reveals no big performance gaps between exist 4.7.1 (1.9s) and baseX 9.3.1 (1.8s)

xquery version "3.1";

let $length := 200000
let $seq := (1 to $length)
return for-each-pair($seq, $seq, function ($a, $b) { $a = $b })

joewiz commented 4 years ago

@line-o Thanks very much for your analysis. Your insight about using a key to workaround the performance limitations reported here helped me find a faster solution for finding the candidates: grouping by the key.

xquery version "3.1";

let $length := 1000

let $sequence-of-maps := (1 to $length) ! map { "n": ., "values": ("a" || util:random(100), "b" || util:random(100), "c" || util:random(100)) }

let $keyed-sequence := 
    for-each($sequence-of-maps, function ($item) {
        map:put($item, "key", string-join($item?values))
    })

for $items in $keyed-sequence
group by $key := $items?key
where count($items) gt 1
return
    array { $items }

(As you might notice, the revised query uses util:random() to generate random values to populate the sequence of maps. This clarifies an unstated assumption in my original query: that the purpose of the query is to discover sets of 2+ items in the sequence with identical values.)

The resulting query performs this task much faster, and I think it is adequate for my purposes.

That said, I should note that the performance is correspondingly faster in BaseX too. As a result, we still see a similar shape when comparing the performance of the two processors (this shows the average of 5 runs at different values of the $length parameter):

To get the query to run in BaseX, instead of substituting random:integer() for util:random(), I adapted the query to use fn:random-number-generator(), which is cross-platform, performed much better in eXist (but requires 5.3.0-SNAPSHOT which includes @adamretter's great fix from https://github.com/eXist-db/exist/pull/3072), and avoided heap space errors that appeared in eXist with $length set to 100,000. That said, with $length set to 100,000, eXist finished in 306s, whereas BaseX finished in .96s.

Here are the queries I used to profile eXist and BaseX's performance: profile-xquery-exist-basex.zip.

In sum, the difference in performance of both the original query reported in the first post and the modified ones here suggest that there eXist still has some performance and scalability potential with this type of query.

line-o commented 4 years ago

@joewiz That is very insightful research! I think it is evident that our implementation of fn:for-each-pair has nothing to do with the performance issue. I would suggest to rename the issue to reflect the new data you delivered.

... and we need to take a look at the FLWOR implementation. Would you have time and energy to compare execution times with a "gluten-free" version? One that uses HOFs to iterate over the dataset instead.

joewiz commented 4 years ago

@line-o Good point. I've renamed the issue to the best of my understanding. And going "gluten-free" is an interesting suggestion. To clarify, you're suggesting that I replace any FLWOR-based iterations with HOF, right? Is that because you suspect the problem is with eXist's FLWOR implementation?

line-o commented 4 years ago

Well, I just remembered I did some testing on this. It could be even slower. And then I tested it... Oh my, try it yourself @joewiz

xquery version "3.1";

import module namespace xbow="http://line-o.de/xq/xbow";

let $length := 10000

let $sequence-of-maps := (1 to $length) ! map { "n": ., "values": ("a" || util:random(100), "b" || util:random(100), "c" || util:random(100)) }

let $keyed-sequence := 
    for-each($sequence-of-maps, function ($item) {
        map:put($item, "key", string-join($item?values))
    })

return xbow:groupBy($keyed-sequence, xbow:pluck("key"))
    => map:for-each(function ($k, $v) {
        if (count($v) > 1) then array{$v} else ()
    })

joewiz commented 4 years ago

@line-o Compared to the equivalent query in my post from yesterday, this one is considerably faster:

length	FLWOR + group by	xbow:groupBy + map:for-each
1000	.062	.052
2000	.236	.08
10000	6.1	.4
20000	*	.8
30000	**	1.1
50000	**	1.3
100000	**	4.3

* CPU pegged, never finishes, monex won't load, have to force-quit eXist

** Didn't even try

This comparison suggests a clear win for xbow's HOF approach, whereas eXist's standard group by completely crumbles past length=10000.

(This "crumbling" phenomenon alone deserves our attention - can we develop some method to kill runaway queries? Can we agree that we should never have to force-quit eXist?)

As I probed the results of my "test runner" that performs 5 runs at many different "lengths" and produces the data used in the charts above, I realized that the step of generating the items itself was taking up a considerable portion of the time for each test run. Since generating dummy data is not the core concern in this issue, I decided to adjust my test to read from pre-generated array of 100,000 items in a static file, and perform the grouping operation on subsets of this. This way, we can be confident that the results let us see how each processor and grouping algorithm performs. Here are the results:

Questions:

For the eXist results, why does xbow perform so much faster in the "simple" scenario reported at the top of this post than in the "test runner" scenario shown in the chart here? For example, in the "simple" scenario, xbow is The only difference is that the test runner breaks up the functionality into several functions and averages the duration of each run.
(That's all I can think of!)

Here are the queries, including both the "simple" scenario and the "test runner" scenario: profile-xquery-exist-basex-v2.zip

p.s. If it helps, here are the results from monex's profiling of the "test runner" scenarios:

group by

xbow

line-o commented 4 years ago

@joewiz I did some optimisations that only times the operation itself. Both approaches are part of testbed.xq . Set up code (loading data and keying it) are not measured as part of the run. The results better resemble, what I concluded in my earlier tests a year ago: runtime of HOFs compared to FLWOR are roughly 2:1

duncdrum commented 4 years ago

@line-o That gist is not the query you are looking for ⚡

line-o commented 4 years ago

@duncdrum the link should be fixed now.

line-o commented 4 years ago

What I find very interesting is that the speed a xquery script finishes can change quite dramatically, even when doing only minor changes to the overall setup. You really have to aim for the sweet spot of the optimiser.

joewiz commented 4 years ago

Before anyone reads this, you might want to take a seat! The difference between the results I reported in my last post and what's below blew me away.

Here is the updated chart using @line-o's "testbed" approach:

Observations about the test runner:

Both eXist and BaseX benefited tremendously from the "testbed" approach. eXist's performance improved by an average of ~60x and BaseX by ~6x. In other words, the test runner itself was responsible for a considerable portion of the apparent difference between the two processors. They are both clearly capable at handling the operation initially chosen to demonstrate the unusually slow phenomenon described in the original post.
The performance difference between eXist and BaseX shrank dramatically. It is now more a 3-4x difference than a 10-40x+ difference. In other words, the two processors perform this test comfortably within the same order of magnitude.
The shape of this curve suggests eXist is now advancing linearly, not exponentially, with the size of the dataset.

Observations about "flwor" vs. "hof":

On one hand, the "FLWOR"-based query performs, on average, 4.5x faster than the "HOF"-based query for eXist. (3.5x for BaseX.)
On the other hand, HOF-based insights were responsible for the spectacular improvements in the performance of the test runner.
These are contradictory points, so I must be misunderstanding something. The first point suggests that plain old FLWOR is highly optimized and better than HOF, but the second point suggests that HOF can be dramatically faster than FLWOR. Thus, I'm still not sure what lesson to draw is.

Implications:

The "testbed" test runner is clearly now the best tool we have to carry out XQuery performance tests. I imagine we'll try to make it easier to "plug in" new queries and generate useful results for comparison.
We identified queries that triggered severe bottlenecks/limitations in eXist. While we found workarounds, speaking for myself, I still do not understand the nature of the bottlenecks.
- Clearly the "keyed" ("group by") approach was better than the "for-each-pair" approach, and this can probably be generalized. Does it come down to this advice? Or was there some other factor at play?
- What lessons do we take from the "FLWOR" vs. "HOF" discussion above? Is there an inherent reason why the "FLWOR" version will always beat out "HOF" (if that's the takeaway), or does this point to an opportunity to unify eXist's iteration classes (as @duncdrum suggested) and thereby lift all boats with the rising tide of optimizations to these classes?
- Why was the original test runner so slow in eXist? It was using really standard XQuery—nothing fancy, just simple FLWOR loops and user-defined functions.
- Since BaseX also benefits from the new test runner by ~6x, there are clearly insights that XQuery developers from both implementations can learn from. Such lessons could be written up as conference papers or blog posts and hopefully crystallized into a few new items in the tuning article). On the other hand, eXist improved by 60x. Simple arithmetic would suggest, therefore, that still-mysterious bottlenecks in eXist are responsible for the remaining 10x of poor performance of the initial test runner. What are these bottlenecks, and can we root them out from eXist so that everyone's existing code can run better?
A side issue, but important to highlight again: Can we give dba users a swift, safe, and reliable mechanism to kill queries, so users are never put in the situation of having to force-quit eXist (and thus trigger a recovery run, etc.)? Are certain expressions or functions not working with the watchdog? Queries in this issue were all in-memory and not writing to the database. They should be eminently haltable. (If the topic of "killing queries" in general raises worries about database inconsistency, could we learn from the XQuery Update spec and provide rollback protections for updating functions and functions with the %updating annotation?
Has this issue completed its mission to shine a light on the "bottlenecks" in eXist, or do we need to keep developing other tests to shine more light on various theories? If the latter, what educated guesses do the core developers have about areas that need to be probed with rigorous performance tests?

duncdrum commented 4 years ago

test-runner

this is worth looking into, tests should not come with this much of an overhead, its possible that some unoptimized iteration in the test runner is again responsible here.

Flowr vs hof

this is my observation as well and why i use HOF sparingly, often reverting back to FLOWR as things crawl to a halt.
always in favour of documentation updates, tuning and HOF is a big todo.

Implications

I suggest to revive the https://github.com/eXist-db/e2e-exist repo and discuss external parameters for testing, such as disabling (or not) the lock table during performance test runs. loading the data from JSON sits kind of wrong with me, as there might be indexes involved in exist that should make loading things from xml into memory faster, don't think they apply to json. I would still suggest to incorporate tooling and graphing the output in a docker-compose.yml while docker itself has a performance overhead at least we have a controlled environment that is quick and easy to use. Maybe create an expath app wrapper with some instructions and a template that is preinstalled in the performance test instances. Finally we want something that can crash and be restarted/reset hence docker makes it worthwhile the overhead imv.
setting break points in xquery, or stopping a running query is certainly a feature worth looking into. I don't see how corruptions can occur on in-memory operations. If they can we have a bug, worth investigating on its own. Updating queries can be caught by analyzing the query, but in the end if users kill force quite an instance with a running write query they are likely corrupting their db's one way or the other, soo 🤷
I don't see why Gluten free queries should come with an inherent performance penalty, maybe i am missing something. So the query optimize should be able to refactor them for optimal performance just like FLOWRs. This is not a simple task, but if we can find a dev to put time into it it would certainly be welcome.
I think that we have seen that when iterations are unavoidable there is room for tuning exist's performance. Because we don't have a centralized iteration class, this is quite a bit of work, lots of things might break. But again the gains would justify to make this a priority. So mission completed.
the query optimizer seems in need of optimizations with respect to iterations and HoF, again not something to dish out during a lunch break.

line-o commented 4 years ago

@joewiz Since I have a very light laptop, where I performed my tests I would now focus on threading. How many (real) cores your existdb is able to use? Background: you report 6:1 whereas I see more 2(2,5):1 runtime ratios.

What is really unclear to me: Why your average runtime for 100K items with HOFs is over 600ms but it is ~490ms on my machine.

joewiz commented 4 years ago

@duncdrum Thanks for your feedback on the big picture questions!

@line-o I'll send you my machine info via DM.

adamretter commented 4 years ago

Can we give dba users a swift, safe, and reliable mechanism to kill queries, so users are never put in the situation of having to force-quit eXist (and thus trigger a recovery run, etc.)?

Not easily... The way that queries are implemented in eXist-db means that they themselves have to check whether a request has been made to terminate them. They only do this between operations, they can't easily do it when they are in a tight loop of processing. The way that queries are executed and managed would need some serious re-design to get it working like you have described.

Are certain expressions or functions not working with the watchdog?

Likely, the developer has to remember to insert calls to check whether the query should be terminated. It's likely there are not as many of these checks as you would like.

Queries in this issue were all in-memory and not writing to the database. They should be eminently haltable. (If the topic of "killing queries" in general raises worries about database inconsistency, could we learn from the XQuery Update spec and provide rollback protections for updating functions and functions with the %updating annotation?

Stopping in-memory queries should be fine. Stopping queries that are writing to the db is very dangerous in eXist-db as there is no real transaction abort/roll-back protection.

adamretter commented 4 years ago

I think this issue is very interesting, but it probably needs to be broken into lots of separate issues. The problem is that re-writing queries and changing approaches is fine as a work around, but it just moves the target when trying to find the performance issue. I think this issue likely shows lots of different performance issues.

eXist-db / exist