eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
423 stars 179 forks source link

[BUG] Possible memory leak in loop over array containing maps #5375

Open nverwer opened 2 months ago

nverwer commented 2 months ago

Description

When extracting data from a large array that contains maps, heap memory in Java keeps growing. This might be caused by a memory leak. When running the same script with the same data, garbage collection appears to retrieve the used memory. However, each time the code or the data changes, the heap memory usage increases and does not return to the previous level.

The following graph was generated using real data (https://zenodo.org/records/10482057) and comes from jconsole: image

Using generated data (see below), the graph is similar: image

Expected behaviour

The heap memory usage should return to a lower level after garbage collection. It should not increase permanently after a change in code or data.

To reproduce

The following script is a much simplified version of a script that gets data out of a (large) array containing maps. In the original script, the data comes from a JSON file, but I get the same results when generating the data in the script:

let $doc as item() := array
  { for $i in 1 to 500000
    return map
      { 'id' : 'id'||$i
      , 'status' : if ($i mod 100 = 0) then 'inactive' else if ($i mod 80 = 1) then 'withdrawn' else 'active'
      , 'relationships' : array{ map
        { 'type' : 'Related'
        , 'id' : 'id'||($i+1)
        }}
      }
  }
let $doc-size as xs:integer := array:size($doc)

let $ids :=
  for $doc-index in 1 to $doc-size
    let $item as map(*) := $doc($doc-index)
    (:let $status := $item?status:)
    (:let $relationships as array(*)? := $item?relationships:)
    where $item?status = ('withdrawn','inactive') and exists($item?relationships)
  return $item?id

return count($ids)

At first, I thought that the memory leak (if that is what this is) was in the loop variables $status and $relationships, but that seems not to be the case, so I commented them out.

The second graph above was generated by running this script a few times, than change 500000 in for $i in 1 to 500000 into 500001, run a few times, change to 500002, run a few times, etcetera.

Context

eXist-db: eXist-6.2.0 JVM: OpenJDK 64-Bit Server VM version 11.0.14.1+1 OS: WIndows 10 eXist is run with the launcher (not as a service, although that appears to have the same problem), with memory.max=8192.

More details

I used VisualVM to analyze a heap dump, to get an idea of what takes up all the space in the heap. This suggests that there is a lot in the cache. However, cache:clear() does not change the used heap space.

image

image

I am not sure if this gives an indication of what is going on.

adamretter commented 2 months ago

@nverwer The cache that your traces are showing is that of compiled XQuery Modules (and not the Cache XQuery Extension Module that is available via the cache:* functions). When eXist-db compiles a Module and executes it, as compilation is time intensive, after execution, it resets (clears) the state of the Module and stores it into a Caffeine Cache. The next time the same query is executed, instead of recompiling it, it is borrowed from the cache.

It looks like the reset of the module is perhaps not resetting some expressions that accumulated state. We have seen this several times in the past for complex expressions. I did fix a number of issues previously with Maps and Arrays in this area. Could you check if I already fixed this in main by building a 7.0.0-SNAPSHOT? If not, it is possibly another bug in this area that needs to be addressed.

nverwer commented 2 months ago

@adamretter Thank you for your response. I compiled the latest 7.0.0-SNAPSHOT and ran the script as shown above. Unfortunately, heap space usage keeps increasing as I change 500000 into 500001, 500002, and so on.

image

It looks like this problem is still there. Although I am beginning to understand some of the Java code for eXist, I am afraid I cannot be of much help here.