eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
422 stars 179 forks source link

Queries take to long time existdb 4.7.0 #2812

Open rpompaideasoft opened 5 years ago

rpompaideasoft commented 5 years ago

What is the problem

Using existdb 2.2 and 4.5 the quries take 10 minutes Using existdb 4.7 the same queries take 20 minutes

What did you expect

Take less time

Questions

We would like to know if the change to BLOB in the way of storing. It will have something to do? Or if we must change something in the configuration. It is a critical issue. Attachment configuration file. Our server has HDD disk. We tested on a local server with SSD and it took less time. Is it required to have an SSD disk for this version? Using existdb 4.7 on SSD disk the same queries take 8 minutes

Context information

dizzzz commented 5 years ago

Regarding query performance:

(please share some examples when possible)

Did you tune any memory settings? please verify with MOnex the index usage (though I don't expect a large performance change compared to 4.5)

rpompaideasoft commented 5 years ago

Regarding query performance:

* what kind of queries ?

access multiple files in multiple collections many times

  • did you define indexes? yes, on many collections

  • what kind of XML structures Nested XML Many nodes inside

  • how many XML documents, what is the typical size 32000 XML files are read in this query, each one between 200bytes and 3Kb of size The generated file exceeds one million lines and 30 Mb in size

Index Usage: existdb 4.5.0: index_usage_9450 existdb 4.7.0: index_usage_9470

Did you tune any memory settings? This is the config file: conf.zip

duncdrum commented 5 years ago

@rpompaideasoft we still don't know what your query does and how it is written. In order to help you we need a minimal reproducible example.

This includes

The fastest way to get us debugging is combining all of the above in an xqsuite test, but a link to a small xar would also work.

rpompaideasoft commented 5 years ago

@rpompaideasoft we still don't know what your query does and how it is written. In order to help you we need a minimal reproducible example.

This includes

* an xquery that shows us how you `access the files` minus all unnecessary parts

Query with lucene: let $idPhraseQuery := {$id}, $result := fn:collection($col-path)//custom-ns:id[ft:query ( . , $idPhraseQuery)]/..

Index usage: existdb 4.5.0 Calls: 9371 Timing: 44.417 existdb 4.7.0 Calls: 9371 Timing: 176.425

Files: 1300 xml files

collection.xconf:

duncdrum commented 5 years ago

@rpompaideasoft there have been quite a few changes with respect to preformance. Can you test with 4.7.1 and the HEAD of develop? Chances are the performance improvements will be a 5.x feature. See https://github.com/eXist-db/exist/pull/2962