clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

AT: File ParlaMint-AT-listPerson.xml contains bad chars: U+A0 (9x) #593

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

U+A0 is not a valid character: https://github.com/clarin-eric/ParlaMint/actions/runs/4027956603/jobs/6924301038#step:4:591

TomazErjavec commented 1 year ago

@hpreki, note that this error showed up beacause we incorporated character validity checking in the validate-parlamint script. As you already finished 3.0, you can address this for 3.1 if you wish. But as we promised that there will be no requirements for changing the content of segments, it is not obligatory. Which doesn't meant we wouldn't be happy to see this corrected, even in 3.0!

hpreki commented 1 year ago

I will look into it. Because it only affects the listPerson.xml it should be a simple local change. I was asked to make some changes to that file anyway (splitting of forename #580 )

hpreki commented 1 year ago

@matyaskopp this issue can be closed