clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

tei2text - join right inside named entity #677

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

Script tei2text ignores join="right" when a token is inside the named entity and the "joined" token is not inside a similar named entity: https://github.com/clarin-eric/ParlaMint/blob/ce00fda7c8210f3cd7a709d8fa77998aac6708b4/Data/ParlaMint-CZ/ParlaMint-CZ_2016-09-08-ps2013-049-03-018-187.ana.xml#L234-L240

distro script is using TEI version, so the issue is not urgent: https://github.com/clarin-eric/ParlaMint/blob/ce00fda7c8210f3cd7a709d8fa77998aac6708b4/Scripts/parlamint2distro.pl#L47

But it would be great to use the same script on both TEI and TEI.ana files and get the same result. Or not and see the bug: image