KorAP / KorAP-XML-Krill

Merge KorapXML data and create Krill documents
BSD 2-Clause "Simplified" License
1 stars 1 forks source link

Multiple identical lemmata in TreeTagger conversion #3

Closed Akron closed 7 years ago

Akron commented 7 years ago

In case Treetagger has multiple POS annotations, it also has multiple lemmata. These, however, may be identical, so they shouldn't be indexed multiple times at the same position.

See, e.g.: "sie werden zu Johanni reif dann setzt der Baum [[noch]] einmal an" in [GOE/AGI/00000]

Akron commented 7 years ago

Turns out to not be a bug. The lemmata are annotated with different certainty values (because they are part of pos+lemma annotations with only one certainty value). I don't know if it makes sense to annotate this multiple times with different certainty values, but I think it's okay for the moment. Display of identical values is now removed from Kalamar.