Improved handling of thread closure in Background query results [RDF4J Server]

rune66 commented 7 years ago

I'm running rdf4j in a Tomcat 8.5 container. From another web app I use the rdf4j-api to access the repositories hosted in rdf4j. I don't know weather it's a problem or not, but when stopping the web app I get a ton of "failed to stop it"-exceptions that seems to originate from rdf4j.

Is this recognizable to anyone, and how to avoid it?

WARNING: The web application [REST] appears to have started a thread named [pool-28-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(Unknown Source) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) java.util.concurrent.ArrayBlockingQueue.put(Unknown Source) org.eclipse.rdf4j.http.client.QueueCursor.put(QueueCursor.java:80) org.eclipse.rdf4j.http.client.BackgroundTupleResult.handleSolution(BackgroundTupleResult.java:139) org.eclipse.rdf4j.query.resultio.sparqlxml.SPARQLResultsSAXParser.endTag(SPARQLResultsSAXParser.java:187) org.eclipse.rdf4j.common.xml.SimpleSAXParser$SimpleSAXDefaultHandler.endElement(SimpleSAXParser.java:337) com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(Unknown Source) com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:221) org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:196) org.eclipse.rdf4j.query.resultio.sparqlxml.AbstractSPARQLXMLParser.parseQueryResultInternal(AbstractSPARQLXMLParser.java:211) org.eclipse.rdf4j.query.resultio.sparqlxml.SPARQLResultsXMLParser.parseQueryResult(SPARQLResultsXMLParser.java:74) org.eclipse.rdf4j.http.client.BackgroundTupleResult.run(BackgroundTupleResult.java:109) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.lang.Thread.run(Unknown Source)

rune66 commented 7 years ago

I wonder if this is connected to issue #293. I'm running 2.1.1 on the server though and 2.1.2 on the API-side.

ansell commented 7 years ago

Which web application is generating the exception? If the stacktrace is generated from inside of rdf4j-server, then we need to look further into why the fixes in #293 are not handling interruption of the background parser thread. If it is generated from your code, then you may not be closing resources when you encounter an InterruptedException.

The stacktrace may or may not be normal for the way we parse results, as we use the ArrayBlockingQueue to limit memory usage, and only a Thread interruption or fetching another binding would make it pass that point.

rune66 commented 7 years ago

The exception is generated outside the rdf4j-server. This is the code connecting to rdf4j:

        long startTime = System.currentTimeMillis();
        try (ByteArrayOutputStream stream = new ByteArrayOutputStream()) {
            SPARQLResultsXMLWriter sparqlWriter = new SPARQLResultsXMLWriter(stream);
            try (RepositoryConnection con = repository.getConnection()) {
                TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, query);
                tupleQuery.setIncludeInferred(false);
                tupleQuery.evaluate(sparqlWriter);
            }

            ArgURIResolver resolver = new ArgURIResolver();
            for (int i = 0; i < relQueries.size(); i++) {
                ByteArrayOutputStream streamRelated = new ByteArrayOutputStream();
                SPARQLResultsXMLWriter sparqlWriterRelated = new SPARQLResultsXMLWriter(streamRelated);
                try (RepositoryConnection conRelated = repository.getConnection()) {
                    TupleQuery tupleQueryRelated = conRelated.prepareTupleQuery(QueryLanguage.SPARQL, relQueries.elementAt(i).sparqlQuery);
                    tupleQueryRelated.setIncludeInferred(false);
                    tupleQueryRelated.evaluate(sparqlWriterRelated);
                }
                java.io.StringReader xmlInReaderRelated = new java.io.StringReader(streamRelated.toString("UTF-8"));
                resolver.setArgument(relQueries.elementAt(i).docName, new javax.xml.transform.stream.StreamSource(xmlInReaderRelated));
            }
            transformer.setURIResolver(resolver);
            try (java.io.StringWriter xmlOutWriter = new java.io.StringWriter()) {
                try (java.io.StringReader xmlInReader = new java.io.StringReader(stream.toString("UTF-8"))) {
                    transformer.transform(new javax.xml.transform.stream.StreamSource(xmlInReader), new javax.xml.transform.stream.StreamResult(xmlOutWriter));
                }
                result = xmlOutWriter.toString();
            }
        } catch (IOException e) {
            logger.catching(e);
        }

Everything should be closed.

/Rune

Den 3. dec. 2016 kl. 00.25 skrev Peter Ansell notifications@github.com:

Which web application is generating the exception? If the stacktrace is generated from inside of rdf4j-server, then we need to look further into why the fixes in #293 https://github.com/eclipse/rdf4j/issues/293 are not handling interruption of the background parser thread. If it is generated from your code, then you may not be closing resources when you encounter an InterruptedException.

The stacktrace may or may not be normal for the way we parse results, as we use the ArrayBlockingQueue to limit memory usage, and only a Thread interruption or fetching another binding would make it pass that point.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eclipse/rdf4j/issues/684#issuecomment-264589881, or mute the thread https://github.com/notifications/unsubscribe-auth/AEeP1mn4s69kcImGUB6k7sXn_0k4S7p7ks5rEKjugaJpZM4LCVGC.

ansell commented 7 years ago

Thanks for the code sample, it does look like you are closing instances correctly. Our ".evaluate(TupleResultHandler)" implementations may need some checking, as there are some resources used by them, so I will look into those.

Another place to check may be the way you are checking out repositories (outside of the code sample above) and calling (or not calling) RepositoryManager.shutdown, if you are using a RepositoryManager. If you are directly opening Repository instances without RepositoryManager, it could also be a lack of a call in the Tomcat shutdown hooks to Repository.shutdown.

In both of those cases, however, I also have identified some cases where lack of closure may still occur even if you call Repository.shutdown in the Tomcat shutdown hooks (ServletContextListener.contextDestroyed), particularly if a Thread is interrupted, so I am proposing some changes to make that code more robust.

rune66 commented 7 years ago

Hi Peter

Thank you. FYI here’s the code handling the Repository:

    Repository repository = null;
    try {
        repository = new HTTPRepository(getRepositoryUri(inclDeprecated, useStageDatabase));
        try {
            repository.initialize();
        } catch (RepositoryException ex) {
            logger.catching(ex);
        }

….

    } finally {
        if(repository != null) {
            repository.shutDown();
        }
    }

I’m using JConsole to monitor the amount of live threads in my Tomcat instance. Every time this piece of code is called the live thread count goes up by one and it doesn’t go down again, not even after half an hour of idle time.

Regards, Rune

Den 4. dec. 2016 kl. 23.33 skrev Peter Ansell notifications@github.com:

Thanks for the code sample, it does look like you are closing instances correctly. Our ".evaluate(TupleResultHandler)" implementations may need some checking, as there are some resources used by them, so I will look into those.

Another place to check may be the way you are checking out repositories (outside of the code sample above) and calling (or not calling) RepositoryManager.shutdown, if you are using a RepositoryManager. If you are directly opening Repository instances without RepositoryManager, it could also be a lack of a call in the Tomcat shutdown hooks to Repository.shutdown.

In both of those cases, however, I also have identified some cases where lack of closure may still occur even if you call shutdown in the Tomcat shutdown hooks, particularly if a Thread is interrupted, so I am proposing some changes to make that code more robust.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/eclipse/rdf4j/issues/684#issuecomment-264737038, or mute the thread https://github.com/notifications/unsubscribe-auth/AEeP1tcR8Qo7IZEU5Hfsn3Ld6gP48JKmks5rEz-tgaJpZM4LCVGC.

ansell commented 7 years ago

Thanks, that code looks completely fine, I will work further on the pull request #687 code when I get a chance to clean up the resource closures.

ansell commented 7 years ago

Can you try reusing a RepositoryConnection object for the duration of the method you posted above instead of creating a new one for each query? I am trying to replicate this to validate any/everything I have done but I am not finding any issues when trying to replicate it with both a single RepositoryConnection and a new RepositoryConnection each time out of an HTTPRepository.

Out of interest, how are you redeploying your application? Ie, are you deleting the WAR file from under the web server or shutting the context down first?

I can see how my changes are improvements, but I still can't see how to replicate your stacktrace in a regression test.

ansell commented 7 years ago

The reason the thread count isn't dropping btw is the use of Executors.newCachedThreadPool internally in SesameClientImpl. By caching threads, it is attempting to avoid the creation of a new thread for future use. One of the SesameClientImpl objects is created for each HTTPRepository and should, after the pull request, be reliably cleaned up when the Repository is shutDown. The existing code has a few quirks, any of which may be causing the client not to be released consistently when the repository is shutdown although I still haven't been able to replicate your issue locally.

One of the quirks that may have been detected as strange by the very strict Tomcat memory reclaiming code is that shutdown was being called on some of the ExecutorService instances, but awaitTermination wasn't being called afterwards as the ExecutorService pattern requires. As shutdown would not have released the thread immediately if it was blocked, Tomcat may have come straight in and noticed the lack of release.

https://github.com/eclipse/rdf4j/blob/master/core/http/client/src/main/java/org/eclipse/rdf4j/http/client/SesameClientImpl.java#L127

rune66 commented 7 years ago

I'll try reusing the RepositoryConnection.

I'm using the Tomcat manager app to undeploy my application. I think the contextDestroyed is called in that process.

Also I guess I should get the project from git so I'll be able to test your corrections.

rune66 commented 7 years ago

I have now tried with a simple piece of code making only one query like this:

    try (ByteArrayOutputStream stream = new ByteArrayOutputStream()) {
        SPARQLResultsXMLWriter sparqlWriter = new SPARQLResultsXMLWriter(stream);
        try (RepositoryConnection con = repository.getConnection()) {
            TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, query);
            tupleQuery.setIncludeInferred(false);
            tupleQuery.evaluate(sparqlWriter);
        }
     }

I'm now reusing the repository-object itself between calls to this function, but creating af new connection each time it's called. I'm also calling Repository.shutdown in my contextDestroyed-method.

The result is the same. For each tupeQuery I have run in the session, Tomcat sees a hanging thread.

ansell commented 7 years ago

@rune66 a Pull Request has been merged for this. Can you test whether using the current git master code changes the results or stacktraces that you are seeing?

abrokenjester commented 7 years ago

Marking as resolved pending information to the contrary.

rune66 commented 7 years ago

@ansell Thanks - I'll check it out.

rune66 commented 7 years ago

I can confirm that the warnings are gone.

eclipse-rdf4j / rdf4j

Improved handling of thread closure in Background query results [RDF4J Server] #684