Closed GoogleCodeExporter closed 9 years ago
The attached file of this comment provides logs about the previous process.
Actually, it miss the last line : « INFO: Stop Treetagger»
That's the bug I would like to fix!
Thanks in advance,
Jérôme R
Original comment by jerome.rocheteau
on 21 Oct 2011 at 3:11
Attachments:
Hi Jérôme,
I am not sure if I understand your problem. I gather that you get the expected
output but you notice that in the end the tree-tagger process still is running.
If this is your problem, then it's a feature in TT4J and a bug in your wrapper.
Override the "destroy()" method in your UIMA wrapper and invoke
TreeTaggerWrapper.destroy() there to stop the background process.
Also comprehensive implementation of an UIMA integration for TreeTagger with
TT4J can be found here:
http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.c
ore-asl/trunk/de.tudarmstadt.ukp.dkpro.core.treetagger
Maybe you want use that instead of writing the whole thing again from scratch.
-- Richard
Original comment by richard.eckart
on 21 Oct 2011 at 3:24
Hi Richard,
It's not a bug of the UIMA Wrapper. You'll find attached a CLI to tt4j.
The problem remains the same. I turn on the trace mode (see zh-test.dbg
attached).
The fact is that tt4j reader doesn't receive the ENDOFTEXT tag
"<This-is-the-end-of-the-text />" although it has been send by the tt4j writer!
Thanks in advance
Jérôme
PS: I won't have written another uima wrapper for tree-tagger if I had known
yours before :) It looks great.
Original comment by jerome.rocheteau
on 24 Oct 2011 at 3:41
Attachments:
Thank you for your investigation of the issue. I'll have a look at as soon as
possible. Meanwhile, if you are inclined to continue investigating the issue, I
suggest you try adding more ".\n" to the flush sequence in
http://code.google.com/p/tt4j/source/browse/tt4j/trunk/org.annolab.tt4j/src/main
/java/org/annolab/tt4j/DefaultModel.java - since the data in your zh-test.dbg
shows that input and output remain in sync until the end, increasing the length
of the flush sequence is a good candidate to fixing the problem. Or maybe a
different flush sequence is required for chinese.
Original comment by richard.eckart
on 24 Oct 2011 at 5:23
Ok. Setting up a test was faster than I though ;) The problem is the flush
sequence. It seems that tree-tagger ignores the "." which I usually use to
flush the output. When I change the flush sequence to
".\n.\n.\n.\n.\n.\n.\n(\n)\n" it works fine. For the other languages that I
have tests for so far, that also works out, so I think I'll just change the
default flush sequence.
Original comment by richard.eckart
on 24 Oct 2011 at 5:37
The changed flush sequence is in release 1.0.16 which should arrive in an hour
or so on Maven Central. It worked for me in a test case that I set up with the
DKPro Core TreeTagger wrapper. It should work for you as well.
Original comment by richard.eckart
on 24 Oct 2011 at 6:43
Thank you very much Richard.
it works fine for me too :)
Original comment by jerome.rocheteau
on 25 Oct 2011 at 7:54
Original comment by richard.eckart
on 25 Oct 2011 at 8:01
Original issue reported on code.google.com by
jerome.rocheteau
on 21 Oct 2011 at 3:06Attachments: