BaseXdb / basex

BaseX Main Repository.
http://basex.org
BSD 3-Clause "New" or "Revised" License
661 stars 267 forks source link

There is significant performance decrease when executing xqueries in parallel than when executed in sequence. #2242

Closed simonam2 closed 10 months ago

simonam2 commented 10 months ago

Description of the Problem

Using basex 10.3. The client code is taken from here: https://github.com/BaseXdb/basex/blob/9/basex-examples/src/main/java/org/basex/examples/api/BaseXClient.java

There is significant performance decrease when executing xqueries in parallel than when executed in sequence.

I've noticed that when I execute a read xquery to the same database 5 times in single tread it takes 5 * 2 sec = 10 seconds in total. Each xquery is executed for 2 seconds and since they are executed in a row the total amount of time is 10s. If the same read xquery to the same database is executed 5 times in 5 different treads each takes about 35 seconds and since they are executed in parallel the total execution time is 35 second. So how is it that it takes total of 10s in the first case of sequential execution and 35 s in case of parallel execution?

In basex documentation is said that Read transactions are executed in parallel, so my question is why the performance of read xqueries in paralel is much slower than when they are run in series (one after the other). https://docs.basex.org/wiki/Transaction_Management

Expected Behavior

The parallel execution of read operations on the same database should be faster than the sequential.

Steps to Reproduce the Behavior

I've noticed that when I execute a read xquery to the same database 5 times in single tread it takes 5 * 2 sec = 10 seconds in total. Each xquery is executed for 2 seconds and since they are executed in a row the total amount of time is 10s. If the same read xquery to the same database is executed 5 times in 5 different treads each takes about 35 seconds and since they are executed in parallel the total execution time is 35 second. So how is it that it takes total of 10s in the first case of sequential execution and 35 s in case of parallel execution?

Do you have an idea how to solve the issue?

No response

What is your configuration?

We use default configurations.

ChristianGruen commented 10 months ago

Please note that parallel queries with concurrent reads are often much slower than serial operations because they can lead to random I/O disk access patterns. I remember there have been some threads on this topic on our mailing list; see e.g.:

https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg04223.html https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg03946.html

In practice, parallel access is primarily useful for allowing multiple users to access resources at the same time.