eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
352 stars 160 forks source link

ShaclSail used over REST API causes ERRORs `Interrupted thread Thread[rdf4j-transaction-0,5,main] but thread is still alive after 1000 ms!` #4937

Open ElenaKurilina opened 3 months ago

ElenaKurilina commented 3 months ago

Current Behavior

When we wrapped our base sail into ShaclSail we started getting errors when transaction commit REST API requests are being processed:

ERROR o.e.r.s.s.ShaclSailConnection.close(760) -  Interrupted thread Thread[rdf4j-transaction-0,5,main] but thread is still alive after 1000 ms!
ERROR o.e.r.s.s.ShaclSailConnection.close(752) -  Closing connection from a different thread than the one that opened it. Connections should not be shared between threads. Opened by Thread[rdf4j-transaction-0,5,main] closed by Thread[repositories/smoketest940_pg,5,main]

My understanding is that transaction commit calls ShaclSailConnection close() which checks transaction owner: If it's not current thread it will try to close the owner thread. In case of REST API, the owner thread will be the tomcat thread that processed transaction begin request.

Expected Behavior

ShaclSailConnection not causing any errors when transactions are used via REST API.

Steps To Reproduce

  1. client send a REST API request to begin a transaction.
  2. Transaction is created in a tomcat thread and is assigned as transaction owner.
  3. client send REST API requests to write data within the transaction (processed with in a diff Tomcat thread).
  4. client send REST API requests to commit the transaction (processed with in a diff Tomcat thread).
  5. Transaction commit calls ShaclSailConnection close which checks transaction owner and if it's not current thread will try to close the owner producing following errors:
    ERROR o.e.r.s.s.ShaclSailConnection.close(760) -  Interrupted thread Thread[rdf4j-transaction-0,5,main] but thread is still alive after 1000 ms!

    and

ERROR o.e.r.s.s.ShaclSailConnection.close(752)  -  Closing connection from a different thread than the one that opened it. Connections should not be shared between threads. Opened by Thread[rdf4j-transaction-0,5,main] closed by Thread[repositories/smoketest940_pg,5,main]
java.lang.Throwable: Throwable used for stacktrace
    at org.eclipse.rdf4j.sail.shacl.ShaclSailConnection.close(ShaclSailConnection.java:754)
    at org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.close(SailRepositoryConnection.java:248)
    at org.eclipse.rdf4j.http.server.repository.transaction.Transaction.close(Transaction.java:392)
    at org.eclipse.rdf4j.http.server.repository.transaction.ActiveTransactionRegistry.lambda$new$0(ActiveTransactionRegistry.java:102)
    at com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1814)
    at com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3467)
    at com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3443)
    at com.google.common.cache.LocalCache$Segment.remove(LocalCache.java:3077)
    at com.google.common.cache.LocalCache.remove(LocalCache.java:4251)
    at com.google.common.cache.LocalCache$LocalManualCache.invalidate(LocalCache.java:4891)

Version

3.7.9

Are you interested in contributing a solution yourself?

Yes

Anything else?

I have not noticed this "stopping owner thread on close" logic in other Sails. What was the reason to add it to ShaclSailConnection?

hmottestad commented 3 months ago

Thanks @ElenaKurilina

This is due to a deadlock issue that was observed with the MemoryStore caused by the use of a ReentrantLock. It's pretty rare for this issue to crop up, but the solution was to always interrupt any thread that could potentially be holding the lock. I'm not sure why you are only seeing this with the ShaclSail but I'll try to investigate.

It might also be wrong to indiscriminately interrupt the other thread, especially since we use interrupts to signal query timeouts. I'll think this over a bit.

DanchoIvanov commented 1 month ago

@hmottestad Here is my idea https://github.com/DanchoIvanov/rdf4j/tree/GH-4937-edits-deadlock-prevention-thread-logging I know it does not solve the issue however I think it makes the logs more useful. It moves the logging after the thead sleep, so it only logs if it will try to interrupt the owner thread. It also logs the owner stack trace if there is one to help detecting deadlocks. Removes second call of the deadlock prevention thread in made in the method forceCloseActiveOperations since there is one already started from the close method before that.

hmottestad commented 1 week ago

A fix for this will be available in 5.0.1. I'll see if I can backport it to 4.3.x.