clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

HU: File ParlaMint-HU-listPerson.xml contains bad chars: U+AD (3x) #594

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

U+AD is not a valid character: https://github.com/clarin-eric/ParlaMint/actions/runs/4027956603/jobs/6924302983#step:4:435

https://clarin-eric.github.io/ParlaMint/#sec-chars

TomazErjavec commented 1 year ago

@lnnoemi , note that this error showed up beacause we now incorporated character validity checking in the validate-parlamint script. As you already finished 3.0, you can address this for 3.1 if you wish. But as we promised that there will be no requirements for changing the content of segments, it is not obligatory. Which doesn't meant we wouldn't be happy to see this corrected, even in 3.0!

TomazErjavec commented 1 year ago

This has been corrected, closing.