Closed powpos360 closed 8 years ago
Alright, turns out that when creating uima flow, I need to add TreeTaggerWrapper annotator BEFORE heideltime annotator. This turns out to be crucial....
Hi, Good to hear that you figured out the issue. If you run into any other problems, please let us know, too. Maybe we can reply faster next time and really provide some help ;-) Cheers, Jannik
hi
I turn in the same issue. The point is I use an external sentence annotator. (dkpro/openNLP). As a result, I get de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence instead of de.unihd.dbs.uima.types.heideltime.Sentence.
Is there any nice way to make heideltime work with such sentence annotator ?
Thanks by advance
HeidelTime includes classes to translate sentence annotations e.g. from CoreNLP and TreeTagger into HeidelTime annotations. It should be fairly easy to add a similar translation for other taggers: https://github.com/HeidelTime/heideltime/blob/master/src/de/unihd/dbs/uima/annotator/stanfordtagger/StanfordPOSTaggerWrapper.java In addition to sentences, you will also want to translate POS when available, as this can help remove some false positives. I don't use UIMA, so I can't tell you how to invoke this in UIMA.
At the very beginning of HeidelTime, we included an AnnotationTranslator in the UIMA kit, which took one kind of annoations, e.g., DKPro's sentence annotations and created heideltime's sentence annotations, but we removed it. You can downlaod an old HeidelTime UIMA kit version and check the details, e.g., version 1.9 https://github.com/HeidelTime/heideltime/releases/tag/VERSION1.9
Can you pleaze confirm me the only mapping I have to do is :
@kno10
In addition to sentences, you will also want to translate POS when available, as this can help remove some false positives.
My pipeline produces those POS annotation from: https://dkpro.github.io/dkpro-core/releases/1.7.0/apidocs/de/tudarmstadt/ukp/dkpro/core/api/lexmorph/type/pos/package-summary.html If I understand correctly the code you provide in StanfordPOSTaggerWrapper.java, the POS are only Strings that are set to a Token Annotation object. Does that mean I just can push "NN" or "ADJ" into those strings and Heideltime wiill understand them out of the box ?
Thanks for your help
The code I use (not using UIMA - only as little as necessary to run heideltime - nor DKpro) simply does this to convert the annotations for HeidelTime:
for(CoreMap sentence : corenlp.sentences()) {
Sentence sent = new Sentence(jcas);
sent.setBegin(sentence.get(CoreNLPAnalyzer.DocumentStartOffsetAnnotation.class));
sent.setEnd(sentence.get(CoreNLPAnalyzer.DocumentEndOffsetAnnotation.class));
sent.setSentenceId(sentence.get(CoreNLPAnalyzer.DocumentSentenceAnnotation.class));
for(CoreLabel label : sentence.get(TokensAnnotation.class)) {
Token t = new Token(jcas);
t.setBegin(label.get(CoreNLPAnalyzer.DocumentStartOffsetAnnotation.class));
t.setEnd(label.get(CoreNLPAnalyzer.DocumentEndOffsetAnnotation.class));
t.setPos(label.get(CoreAnnotations.PartOfSpeechAnnotation.class));
t.addToIndexes();
}
sent.addToIndexes();
}
By providing offsets you get the correct offsets from Heideltime. I don't know if the sentence is currently used by released Heideltime - my modified version uses it for resolving ambiguous dates.
I have been considering to add an abstraction layer in my branch, which could allow HeidelTime to operate directly on CoreNLP annotations, so I don't have to perform this copying. But that requires a considerable effort.
Make sure to double-check sentence splitter quality. For example CoreNLP without workarounds will split "Der 3. November" into two sentences, because it is a bit overoptimized for English.
HeidelTime makes use of the following preprocessing information
Temporal expressions across sentence boundaries won't be detected. The issue with wrong sentence splitting, which Erich pointed out, is the reason why we included several modifications for the sentence splitting process, e.g., for German and French.
I have been able to use heideltime in my own pipeline. I also created a simple mapper annotator. This mapper may be enhanced for other tasks, will see. (create, update, delete, merge annotation)
Maybe in few time I will post the details on how to put heideltime in one own pipeline.
Thanks guys.
I tried to reproduce the evaluation result using WikiWars. Follow the wiki, I can reproduce same results using v2.1. However, I followed same steps using other versions (tried 1.3, 1.6, 1.7, and 1.8), but received
..[de.unihd.dbs.uima.annotator.heideltime.HeidelTime] HeidelTime has not found any sentence tokens in this document. HeidelTime needs sentence tokens tagged by a preprocessing UIMA analysis engine to do its work. Please check your UIMA workflow and add an analysis engine that creates these sentence tokens.
everytime. I have changed the .bash_profile accordingly. Is there any other particular adjustments I should have done when setting up the experiment? Thanks a lot.