Closed GoogleCodeExporter closed 9 years ago
Can you provide a stack trace please?
Original comment by richard.eckart
on 17 Jul 2014 at 10:47
Exception in thread "main"
de.tudarmstadt.ukp.dkpro.lab.engine.ExecutionException:
de.tudarmstadt.ukp.dkpro.lab.engine.ExecutionException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator
processing failed.
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.ExecutableTaskEngine.run(ExecutableTaskEngine.java:68)
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskExecutionService.run(DefaultTaskExecutionService.java:48)
at de.tudarmstadt.ukp.dkpro.lab.Lab.run(Lab.java:97)
at de.tudarmstadt.ukp.experiments.AA.VSDtoTC.main.VSD_Runner2.runTrainTest(VSD_Runner2.java:153)
at de.tudarmstadt.ukp.experiments.AA.VSDtoTC.main.VSD_Runner2.main(VSD_Runner2.java:84)
Caused by: de.tudarmstadt.ukp.dkpro.lab.engine.ExecutionException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator
processing failed.
at de.tudarmstadt.ukp.dkpro.lab.uima.engine.simple.SimpleExecutionEngine.run(SimpleExecutionEngine.java:178)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.runNewExecution(BatchTask.java:350)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.executeConfiguration(BatchTask.java:255)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.execute(BatchTask.java:185)
at de.tudarmstadt.ukp.dkpro.tc.weka.task.BatchTaskTrainTest.execute(BatchTaskTrainTest.java:86)
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.ExecutableTaskEngine.run(ExecutableTaskEngine.java:55)
... 4 more
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:394)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:410)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at de.tudarmstadt.ukp.dkpro.lab.uima.engine.simple.SimpleExecutionEngine.run(SimpleExecutionEngine.java:141)
... 9 more
Caused by: java.lang.IllegalArgumentException: value cannot be null
at org.apache.lucene.document.Field.<init>(Field.java:239)
at org.apache.lucene.document.StringField.<init>(StringField.java:60)
at de.tudarmstadt.ukp.dkpro.tc.features.ngram.meta.LuceneBasedMetaCollector.initializeDocument(LuceneBasedMetaCollector.java:99)
at de.tudarmstadt.ukp.dkpro.tc.features.ngram.meta.LuceneBasedMetaCollector.process(LuceneBasedMetaCollector.java:112)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378)
... 16 more
Original comment by alot...@gmail.com
on 17 Jul 2014 at 10:49
Looks like the documentTitle is not set in the DocumentMetaData annotation of
the CAS:
DocumentMetaData.get(jcas).setDocumentTitle(...);
Original comment by richard.eckart
on 17 Jul 2014 at 11:51
What kind of Preprocessing did you run?
Original comment by daxenber...@gmail.com
on 17 Jul 2014 at 11:51
Nothing actually, (NoOpAnnotator.class).
Also, I set the setDocumentTitle as in:
DocumentMetaData.get(aJCas).setDocumentTitle(document.getName());//Added
Original comment by alot...@gmail.com
on 18 Jul 2014 at 8:45
To run an NGram feature extractor, you need to have at least sentence and token
annotations in your CAS. If you did not supply them via the reader, you need to
add preprocessing components which will do the job.
Original comment by daxenber...@gmail.com
on 18 Jul 2014 at 2:38
The input data is already structured, and have annotations such as POS, Lemma,
and Constituent. What's been taking long is the creation of files and the
iteration over those files. It seems like UnitClassification creates a file for
every classification unit in the data (WSDitem). and then iterates over them in
the FE part. Such iteration says:
"MetaInfoTask ... Progress
de.tudarmstadt.ukp.dkpro.core.io.bincas.BinaryCasReader 513/998 file"
However, with more data it complains about memory-size. Is there someway of
preventing UnitClassification from the file creation? And do the meta
collecting in some other way?
Original comment by alot...@gmail.com
on 22 Jul 2014 at 9:30
The meta-extraction task is run in this way for a reason, as this is the only
way to ensure that there is no information leak between train/test.
In order to determine why exactly you run into memory problems, it would be
necessary to better understand what is going on. Please profile the memory
usage and give some more pointers on where the memory is consumed.
Original comment by torsten....@gmail.com
on 22 Jul 2014 at 9:37
What is the version of uimaj-core on your classpath when you get the memory
problems?
Original comment by richard.eckart
on 22 Jul 2014 at 9:46
@torsten: stack trace screen is attached. Crashes at Meta Extraction Task.
@richard.eckart: .classpath file is attached.
Original comment by alot...@gmail.com
on 23 Jul 2014 at 8:12
Attachments:
how much memory did you assign to that run?
In unit classification mode, *each* classification unit will get its own CAS in
the remaining pipeline. If you want to prevent that, you have to limit the
classification unit annotations in the reader.
Original comment by daxenber...@gmail.com
on 23 Jul 2014 at 9:11
This looks like an error in UIMA that was fixed some time ago.
Unfortunately, your answer does not provide the information about the
uimaj-core version that Richard was asking about.
Could you please provide that.
Original comment by torsten....@gmail.com
on 23 Jul 2014 at 9:26
The .classfile only contains a reference to Maven but does not state what
dependencies Maven injects. You need to look that up either in the pom.xml file
(dependency hierarchy) or for a really definitive answer run your project in
debug mode and look at the classpath set on the running debug instance. Then
please just tell us the version that you believe is being used.
Please do not send screenshots unless we ask for them. Instead, please
copy/paste the error message text here - this will also make it easier for
other people with a similar problem to find the issue via a web search.
Original comment by richard.eckart
on 23 Jul 2014 at 10:21
@daxenber:
In "eclipse.ini" I have:
-Xms512m
-Xmx2048m
-XX:PermSize=512M
-XX:MaxPermSize=2048M
Also, During reading at getNext(JCas aJCas), I add: unit.addToIndexes(); &
outcome.addToIndexes(); I'll need existing annotations during feature
extraction.
No preprocessing is needed mostly. Are you suggesting to discard all
annotations (except for unit/outcome) during reading?
@torsten & @richard.eckart:
Effective POM has:
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimaj-core</artifactId>
<version>2.4.2</version>
</dependency>
Original comment by alot...@gmail.com
on 23 Jul 2014 at 10:29
Check what other indirect uimaj dependencies you have and add all of them using
version 2.6.0 to you POM - that should fix the memory leak with the binary CAS.
For reference the related bug in UIMA:
https://issues.apache.org/jira/browse/UIMA-3747
The eclipse.ini settings do not affect programs you start within Eclipse - the
memory settings for those are in their respective "Run configurations"
accessible via the "Run..." menu in Eclipse.
Original comment by richard.eckart
on 23 Jul 2014 at 10:32
Only the TextClassificationUnit annotations will be used to split existing
documents into several CAS (the other annotations don't matter here). If you
need all of them for FeatureExtraction and Classification, there's no way
around that.
Original comment by daxenber...@gmail.com
on 23 Jul 2014 at 10:40
Any updates on this issue?
Original comment by daxenber...@gmail.com
on 29 Jul 2014 at 9:52
Not from my side. Converted to FrequencyDistribution FEs instead.
Original comment by alot...@gmail.com
on 29 Jul 2014 at 12:18
Original comment by daxenber...@gmail.com
on 29 Jul 2014 at 12:44
Original comment by daxenber...@gmail.com
on 14 Aug 2014 at 2:41
Original comment by daxenber...@gmail.com
on 1 Apr 2015 at 5:10
Original issue reported on code.google.com by
alot...@gmail.com
on 17 Jul 2014 at 10:46