apache / uima-ruta

Apache UIMA Ruta
https://uima.apache.org
Apache License 2.0
17 stars 5 forks source link

On yourkit thread analysis shows executor threads are blocked during multithreading scenrios. #175

Closed raghu298 closed 1 week ago

raghu298 commented 2 weeks ago

Describe the bug when load testing is done and on checking thread profiling on yourkit it shows many executor threads are getting blocked.

Below is the stack trace which is occurring engine process and rutaCommand execute.

Engine stacktrace:

"io-executor-thread-11" Blocked java.util.Collections$SynchronizedMap.values(Collections.java:2709) org.apache.uima.analysis_engine.impl.AnalysisEngineManagementImpl.mark(AnalysisEngineManagementImpl.java:156) org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.buildProcessTraceFromMBeanStats(AnalysisEngineImplBase.java:624) org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.buildProcessTraceFromMBeanStats(AnalysisEngineImplBase.java:576) org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.buildProcessTraceFromMBeanStats(AggregateAnalysisEngine_impl.java:635) org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.buildProcessTraceFromMBeanStats(AnalysisEngineImplBase.java:576) org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:302) org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:312) com.pega.nlp.textanalytics.engines.pool.AnalysisEnginePoolHolder.analyze(AnalysisEnginePoolHolder.java:209) com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:113) com.pega.fnx.textservice.prediction.PredictionExecutionManager.executePrediction(PredictionExecutionManager.java:127) com.pega.fnx.textservice.prediction.$PredictionExecutionManager$Definition$Intercepted.$$access$$executePrediction() com.pega.fnx.textservice.prediction.$PredictionExecutionManager$Definition$Exec.dispatch()

RutaCommand initalization:

"io-executor-thread-3" Blocked org.apache.uima.internal.util.Misc.shareExisting(Misc.java:907) org.apache.uima.cas.impl.FsIndex_singletype.(FsIndex_singletype.java:166) org.apache.uima.cas.impl.FsIndex_flat.(FsIndex_flat.java:55) org.apache.uima.cas.impl.FsIndex_snapshot.iterator(FsIndex_snapshot.java:101) org.apache.uima.cas.impl.FsIndex_snapshot.iterator(FsIndex_snapshot.java:89) org.apache.uima.cas.impl.FsIndex_snapshot.iterator(FsIndex_snapshot.java:35) org.apache.uima.ruta.RutaStream.getSortedUniqueAnchors(RutaStream.java:365) org.apache.uima.ruta.RutaStream.createBasics(RutaStream.java:326) org.apache.uima.ruta.RutaStream.createBasics(RutaStream.java:260) org.apache.uima.ruta.RutaStream.initalizeBasics(RutaStream.java:239) com.pega.nlp.ner.command.RutaCommand.initializeStream(RutaCommand.java:319) com.pega.nlp.ner.command.RutaCommand.execute(RutaCommand.java:380) com.pega.nlp.ner.command.EntityCommandExecutor.execute(EntityCommandExecutor.java:32) com.pega.nlp.ner.annotators.EntityAnnotator.doProcessing(EntityAnnotator.java:93)

raghu298 commented 2 weeks ago

Detailed Thread Analysis

1. Thread: io-executor-thread-3


2. Thread: io-executor-thread-11

raghu298 commented 2 weeks ago

@pkluegl @reckart Could you please cross check if any improvements or any suggestions?

reckart commented 2 weeks ago

Are you accessing the same CAS instance from multiple threads? The CAS is not thread-safe - in particular not for write access.

Multi-threaded processing in UIMA must be implemented such that a CAS is always only used by a single thread exclusively. Typically, multiple instances of UIMA components/pipelines are instantiated (one per thread) and then multiple CASes are created and each CAS is passed through only one of these components/pipelines at a time. There must never be two processing threads that simultaneously access the same CAS instance.

raghu298 commented 2 weeks ago

we ensuring that the same JCas instance will not be accessed from multiple threads simultaneously, but what about the analysis Engine?

reckart commented 2 weeks ago

each thread should have its own analysis engine instance and its own CAS instance

Cf: https://uima.apache.org/d/uimaj-current/ref.html#ugr.ref.xml.cpe_descriptor.overview

raghu298 commented 2 weeks ago

okay you mean we can have pool of jcasPool and enginePool and get the jcas and engine and corresponding releases too in multithread environment? Does that will be threadsafe?

reckart commented 2 weeks ago

A threadsafe class is one that can safely be used from multiple threads at the same time. The CAS is typically not thread-safe. We also assume that analysis engines are not thread-safe by default.

So you need at least one CAS per thread.

If your analysis engines are not thread-safe, you also need one per thread.

Having one-per-thread does not make either the CAS or the analysis engine thread-safe. However, it makes the whole setup multi-threaded/parallel despite the CAS/AE not being thread-safe.