apache / uima-uimaj

Apache UIMA Java SDK
https://uima.apache.org
Apache License 2.0
64 stars 37 forks source link

NullPointerException while creating engine instance and execution #314

Closed azazali30 closed 1 year ago

azazali30 commented 1 year ago

Describe the bug we are running analysis using JcasPool which at a time can have 60 Jcas objects available. After upgrade to UIMA 3.4.1 we started seeing this NullPointerException in ResultSpecification_impl.intersect FYI:

error: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:415)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:299)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:590)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:422)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:352)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:276)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:295)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:312)
    at com.pega.nlp.textanalytics.engines.pool.AnalysisEnginePoolHolder.analyze(AnalysisEnginePoolHolder.java:214)
    at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:117)
    at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:62)
    at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessorTest.lambda$testConcurrency$0(TextAnalyticsAccessorTest.java:225)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
    at org.apache.uima.analysis_engine.impl.ResultSpecification_impl.intersect(ResultSpecification_impl.java:743)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    ... 15 more
error: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:415)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:299)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:590)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:422)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:352)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:276)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:295)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:312)
    at com.pega.nlp.textanalytics.engines.pool.AnalysisEnginePoolHolder.analyze(AnalysisEnginePoolHolder.java:214)
    at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:117)
    at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessor.runTextAnalytics(TextAnalyticsAccessor.java:62)
    at com.pega.nlp.textanalytics.accessor.TextAnalyticsAccessorTest.lambda$testConcurrency$0(TextAnalyticsAccessorTest.java:225)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
    at org.apache.uima.analysis_engine.impl.ResultSpecification_impl.intersect(ResultSpecification_impl.java:743)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    ... 15 more

Please complete the following information:

Additional context Add any other context about the problem here.

reckart commented 1 year ago

Are you actually using result specifications in your setup?

azazali30 commented 1 year ago

No we are not using it not sure how its being used internally

reckart commented 1 year ago

Are you declaring any output capabilities in your XML descriptors or using uimaFIT annotations?

reckart commented 1 year ago

Are you calling the process method or a similar method of a UIMA component from multiple concurrent threads?

azazali30 commented 1 year ago

below is a jist of main code i have extracted from my code base

org.apache.uima.util.JCasPool jCasPool = new JCasPool(poolSize, aae)

List extractors = //list of Annotators final AnalysisEngineDescription aaeDesc = org.apache.uima.fit.factory.AnalysisEngineFactory .createEngineDescription(extractors.toArray(new AnalysisEngineDescription[extractors.size()])); AnalysisEngine engine = org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(aaeDesc);

for(String text : textsArray) { try { //update jcas with text JCas jcas = jCasPool.getJCas() engine.process(jCas) } finally { jCasPool.releaseJCas(jCas) }

}``

reckart commented 1 year ago

Ok, but is this code called from multiple threads? Note that UIMA components are not expected to be thread-safe. When UIMA parallelizes, it creates multiple instances of a component - one for each of the parallel threads. A component may declare that it is not parallelizable (e.g. writers or components with static fields), then UIMA would not parallelize the component at all and only use a single single-threaded instance of this component.

Are you trying to share a component across multiple concurrent threads?

azazali30 commented 1 year ago

Ok, but is this code called from multiple threads? Note that UIMA components are not expected to be thread-safe. When UIMA parallelizes, it creates multiple instances of a component - one for each of the parallel threads. A component may declare that it is not parallelizable (e.g. writers or components with static fields), then UIMA would not parallelize the component at all and only use a single single-threaded instance of this component.

Are you trying to share a component across multiple concurrent threads?

we are caching the AnalysisEngine engine = org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(aaeDesc); so every thread will be using same instance of AnalysisEngine . Is that fine

reckart commented 1 year ago

If every thread is using the same instance of the analysis engine, then you are sharing that instance across threads. This is not supported. Every thread must have its own instance.

azazali30 commented 1 year ago

@reckart i wonder why JcasPool doc says we can use this pool when there is a need of multiple CASes to be processed simultaneously. And if you see JcasPool has a constructor which accepts Analysis Engine as parameter , this means it will create these jcas instances using same AE. Can you help me understand why this is not contradicting with your statement thanks.

reckart commented 1 year ago

The creation of a new CAS can be an expensive process. Thus, instead of creating a new CAS object for every document, it can be sensible to maintain a pool of CAS objects which are reused while processing a batch of documents.

The CAS pool needs to know information like the type system, index definitions, etc. which can be obtained from an analysis engine - it does not need the engine itself. The constructor that takes an engine is a convenience constructor. The relevant one is org.apache.uima.util.JCasPool.JCasPool(int, ProcessingResourceMetaData) which only considers the configuration, not the actual engine.