KorAP / KorAP-XML-Krill

Merge KorapXML data and create Krill documents
BSD 2-Clause "Simplified" License
1 stars 1 forks source link

Fix lemma and pos annotations in TreeTagger foundry #4

Closed Akron closed 6 years ago

Akron commented 6 years ago

Whenever there are multiple interpretations in the Treetagger foundry for a term (with multiple pos tags associated with different certainty values), all interpretations are indexed separately, including the lemma. This unfortunately means, that even if the lemma is identical for multiple POS interpretations, it will be indexed multiple times with different certainty values. To avoid that, the certainty values should be summed for each interpretation first and then indexed. This is based on a suggestion made by @kupietz .

Akron commented 6 years ago

Fixed in https://github.com/KorAP/KorAP-XML-Krill/commit/28dc17f92c2e5270b034b6562b26ac9e936131c3