runPipeline() never calls destroy() on the analysis engines

Anton87 / uimafit

Automatically exported from code.google.com/p/uimafit

0 stars 0 forks source link

runPipeline() never calls destroy() on the analysis engines #88

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago

uimaFIT runPipeline() never calls destroy() on the analysis engines. This is a 
problem with the UIMA Sandbox Lucas component which closes the index during 
destroy() -- which is in itself kind of questionable.

runPipeline() should at least invoke destroy() when called with descriptors.

Original issue reported on code.google.com by richard.eckart on 26 Apr 2011 at 3:17

GoogleCodeExporter commented 8 years ago

Looks like we're already calling .close() on the reader for 
runPipeline(CollectionReader, AnalysisEngine...), so I guess we could add the 
.destroy() calls too. Alternatively, we could say calling .close() is a bug 
since we didn't create the CollectionReader, and we could remove that.

For the runPipeline with descriptors, I agree we should be calling .destroy().

Original comment by steven.b...@gmail.com on 26 Apr 2011 at 3:39

GoogleCodeExporter commented 8 years ago

So, we will call destroy on AEs for the version of runPipeline in which we 
create the AEs from descriptors but will not call it for the version of 
runPipeline that is passed AEs.  Is that how I read this?  I can see this going 
either way (i.e. with the other way being that we call destroy() in both 
versions of runPipeline).  It may be confusing to do it in one and not the 
other - so if that is what we do then we need to make sure it is clearly 
documented in the javadocs.

Original comment by phi...@ogren.info on 27 Apr 2011 at 3:47

GoogleCodeExporter commented 8 years ago

Locally I have made the change so far to call destroy on anything created from 
a descriptor, but not when runPipeline gets something that has been created 
from a descriptor.

Regarding the non-descriptor versions, I think probably we should not call 
destroy() and document that. In any case, I don't like the non-descriptor 
versions very much, but I'll leave that for a different issue.

Original comment by richard.eckart on 27 Apr 2011 at 4:31

GoogleCodeExporter commented 8 years ago

Yeah, I think we should treat AnalysisEngines (and CollectionReaders) like we 
would an InputStream - whoever creates it is responsible for destroying/closing 
it. So that means no .close() or .destroy() in the non-descriptor versions, but 
both .close() and .destroy() in the descriptor versions since we're doing the 
creation.

Original comment by steven.b...@gmail.com on 27 Apr 2011 at 7:34

GoogleCodeExporter commented 8 years ago

I am making the discussed changes including removing the call to reader.close() 
when the reader was not produced by runPipeline().

Before committing the change, I would like to know if I could also remove the 
calls to ae.collectionProcessComplete() in runPipeline(JCas, AEs...). I guess 
collectionProcessComplete() should be triggered when the reader has produced 
the last CAS. Since we do process individual CASes here, I guess it may be 
prudent to leave it to the caller to notify the AEs when the collection 
processing is complete. What do you think?

Original comment by richard.eckart on 27 Apr 2011 at 4:44

Changed state: Started

GoogleCodeExporter commented 8 years ago

Yeah, I agree that if all we have is a JCas, we can't assume that the 
collection processing is complete and we should leave that up to the caller. 
And document that of course. ;-)

Original comment by steven.b...@gmail.com on 27 Apr 2011 at 5:40

GoogleCodeExporter commented 8 years ago

Fixed in rev. 604.

Original comment by richard.eckart on 27 Apr 2011 at 6:28

Added labels: Compatible-No

GoogleCodeExporter commented 8 years ago

Original comment by richard.eckart on 2 May 2011 at 5:05

Changed state: Fixed