dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Add a test case to test HyphenationRemover in a pipeline with segmentation #786

Closed maxxkia closed 8 years ago

maxxkia commented 8 years ago
reckart commented 8 years ago

Actually the test as it is written now cannot work. The normalizers are CAS multipliers. This means the components get a CAS and create a new one instead. The original CAS must then be dropped by the surrounding aggregate. In the test case here, engine.process() is called. This doesn't give the CAS multiplier the opportunity of returning the new CAS. Instead, engine.processAndOutputNewCASes() would need to be called. There might be additional things to be observed...

The HyphenationRemover would work in a SimplePipeline.runPipeline() pipeline when a reader and a writer are used. If no writer is to be used, then e.g. the JCasHolder technique also used by AssertAnnotations.assertTransformedText(...) can be employed.

maxxkia commented 8 years ago

Alright, now I see where the problem is originating from. But a few notes here:

  1. I also tried this with engine.processAndOutputNewCASes() but it also produces the same result.
  2. I have also tested this in a pipeline with a reader and writer, and I run the pipeline with SimplePipeline.runPipeline method in that code. But still the same result. The problem is that no sentence annotations are produced so that my writer can output the correct text to the file. So, no matter if I use a writer or simply assert on sentence (or token) annotations, my problem is that there are sentence (and token) annotations that are missing in my jcas.
reckart commented 8 years ago

I believe simply using engine.processAndOutputNewCASes() may not be sufficient - you'd have to check how the default UIMA flow controller behaves internally with respect to CAS multipliers and dropping/forwarding CASes. What I typically do if I have a CAS multiplier is, that I put it along with all other components (including writers) into an aggregate - SimplePipeline.runPipeline() also does that. That should normally work if I remember correctly.

reckart commented 8 years ago

Can you commit the version that uses SimplePipeline.runPipeline() ?

maxxkia commented 8 years ago

Yes, I will commit the code shortly.

On Mon, Feb 22, 2016 at 10:59 AM, Richard Eckart de Castilho < notifications@github.com> wrote:

Can you commit the version that uses SimplePipeline.runPipeline() ?

— Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-core/issues/786#issuecomment-187103674.

Masoud Kiaeeha

reckart commented 8 years ago

Not sure why this issue was closed. I updated the test - see commit comments. Now works better, but I didn't fix the assert - please check out the changes and check if the test now works ok for you.

maxxkia commented 8 years ago

I pulled and checked the new test. It works fine. I noticed I had to also fix the expected output. Now this test passes successfully. So should the bug #787 be also closed?

On Mon, Feb 22, 2016 at 12:45 PM, Richard Eckart de Castilho < notifications@github.com> wrote:

Not sure why this issue was closed. I updated the test - see commit comments. Now works better, but I didn't fix the assert - please check out the changes and check if the test now works ok for you.

— Reply to this email directly or view it on GitHub https://github.com/dkpro/dkpro-core/issues/786#issuecomment-187138109.

Masoud Kiaeeha

maxxkia commented 8 years ago

p.s: please check out my new change

maxxkia commented 8 years ago

@reckart: Should I close this and also the reported bug?

reckart commented 8 years ago

Yep. We could use a new bug though for the problem that we have to explicitly set PARAM_LANGUAGE. Cf. https://github.com/dkpro/dkpro-core/commit/64e7e37e39124e8a6a960b4497403f4011f011f6#commitcomment-16235529

maxxkia commented 8 years ago

Done. Created the bug #790 for this problem.