eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
424 stars 179 forks source link

[BUG] Server deadlock when updating XSL file #4233

Open paulmer opened 2 years ago

paulmer commented 2 years ago

Describe the bug eXist can become unresponsive if an XSL file is updated while the file is in use by a REST request to perform a transformation.

The general scenario is this: The eXist database contains an XQuery script and an XSL document. The script optains an XML document from an external source (fn:doc($url)) and produces a result using transform:transform and supplying the XSL document as the stylesheet. This script responds to web requests forwarded to the REST interface.

A second application, using a remote XMLDB connection, updates the XSL document. If the timing is right, a deadlock is created that causes the server to stop responding to all requests.

The only recovery possible is to forceably kill the server.

Expected behavior Both the transformation and the update should succeed.

To Reproduce This is a difficult scenario to reproduce as it depends on the timing of the query and update requests. I do not have a method to reproduce it outside of the combination of tools I've described, or I would provide such an example.

I will attach a jstack listing of the server in the deadlocked state.

Screenshots If applicable, add screenshots to help explain your problem.

Context (please always complete the following information):

Additional context

paulmer commented 2 years ago

The jstack trace of the deadlocked server:
trace.txt

adamretter commented 2 years ago

Without seeing the Lock Table, I cannot be certain. However...

  1. The thread executing the RPC connection (likely XML:DB Remote) is waiting to acquire a WRITE_LOCK on a document, to so it needs to acquire the lock with ID 0x0000000707025300. If Path Locks for documents are not in use (which is the default, you can change this in conf.xml), then to do so it will have first acquired a READ or WRITE (depending on the operation) lock on the parent collection, and INTENTION locks on the ancestor collections. We can see that the thread executing the RPC in the process of trying to acquire the WRITE_LOCK has already acquired several locks, one of which is a lock with ID 0x000000073418aa78.

  2. All of the threads attempting to read a document are waiting to acquire a READ lock on a Collection, however that operation requires lock with ID 0x000000073418aa78 which is currently held by the thread executing the RPC.

  3. To know if this is a deadlock, we would need to understand which thread currently holds the lock with ID 0x0000000707025300 which we can't see in the jstack. This is where a Lock Table dump would help us.

  4. The threads that have - waiting to re-lock in wait() <0x00000007015d28d0> (a org.exist.storage.BrokerPool) are all waiting for new brokers to become available as the broker pool is completely saturated, mostly by threads waiting to read a document (which are blocked) as they are waiting for the document READ lock (0x000000073418aa78).


Approximately 18 threads doing this:

        - parking to wait for  <0x000000073418aa78>
        ...
        at uk.ac.ic.doc.slurp.multilock.MultiLock.intentionReadLock(MultiLock.java:501)
        ...
        at org.exist.storage.lock.LockManager.acquireCollectionReadLock(LockManager.java:219)
        ...
    at org.exist.storage.NativeBroker.getXMLResource(NativeBroker.java:2217)
    at org.exist.http.RESTServer.doGet(RESTServer.java:401)

1 thread doing this:

        - parking to wait for  <0x000000073418aa78> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)
        ...
    at uk.ac.ic.doc.slurp.multilock.MultiLock.intentionReadLock(MultiLock.java:501)
        ...
        at org.exist.storage.lock.LockManager.acquireCollectionReadLock(LockManager.java:219)
        ...
        at org.exist.xslt.EXistURIResolver.databaseSource(EXistURIResolver.java:183)
    at org.exist.xslt.EXistURIResolver.resolve(EXistURIResolver.java:161)
    at org.exist.util.EXistURISchemeURIResolver.resolve(EXistURISchemeURIResolver.java:47)
    at org.exist.util.URIResolverHierarchy.resolve(URIResolverHierarchy.java:65)
    at net.sf.saxon.style.UseWhenFilter.processIncludeImport(UseWhenFilter.java:274)
    at net.sf.saxon.style.UseWhenFilter.startElement(UseWhenFilter.java:223)
    at net.sf.saxon.event.StartTagBuffer.startContent(StartTagBuffer.java:246)
    at net.sf.saxon.event.ProxyReceiver.startContent(ProxyReceiver.java:177)
    at net.sf.saxon.event.ProxyReceiver.startContent(ProxyReceiver.java:177)
    at net.sf.saxon.event.ProxyReceiver.startContent(ProxyReceiver.java:177)
    at net.sf.saxon.tree.linked.ElementImpl.copy(ElementImpl.java:333)
    at net.sf.saxon.tree.linked.ElementImpl.copy(ElementImpl.java:339)
    at net.sf.saxon.tree.linked.DocumentImpl.copy(DocumentImpl.java:655)
    at net.sf.saxon.event.Sender.sendDocumentInfo(Sender.java:249)
        ...
    at org.exist.xquery.functions.transform.Transform.eval(Transform.java:197)

1 thread doing this:

        - parking to wait for  <0x0000000707025300> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)
        ...
        at uk.ac.ic.doc.slurp.multilock.MultiLock.writeLock(MultiLock.java:485)
        ...
        at org.exist.storage.lock.LockManager.acquireDocumentWriteLock(LockManager.java:557)
        ...
    at org.exist.xmlrpc.RpcConnection.withDb(RpcConnection.java:4059)

    Locked ownable synchronizers:
    - <0x00000007315a73c8> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)
    - <0x000000073418aa78> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)
    - <0x000000073418abc0> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)
    - <0x000000073418ad10> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)
    - <0x000000073418ae60> (a uk.ac.ic.doc.slurp.multilock.MultiLock$Sync)

Approximately 37 threads doing this:

  java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@11.0.13/Native Method)
    - waiting on <no object reference available>
    at java.lang.Object.wait(java.base@11.0.13/Object.java:328)
    at org.exist.storage.BrokerPool.get(BrokerPool.java:1257)
    - waiting to re-lock in wait() <0x00000007015d28d0> (a org.exist.storage.BrokerPool)
    at org.exist.http.servlets.EXistServlet.doGet(EXistServlet.java:320)