clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

Missing <sex> for persons #870

Closed TomazErjavec closed 4 months ago

TomazErjavec commented 4 months ago

It seems that my fix for inserting <sex> into <person>s missing it doesn't work, we still have e.g. in CZ (but also in other corpora) things like:

   <person xml:id="StanislavKrecek">
      <persName>
         <surname>Křeček</surname>
         <forename>Stanislav</forename>
      </persName>
   </person>

This now e.g. results in the corpora in concordancers having two values for unknown speech/@person_gender, namely 'U' and '-'. This needs to be fixed, unfortunatelly, @matyaskopp, it will also mean having to re-process all the corpora.

Here are the offending templates which need to be fixed: https://github.com/clarin-eric/ParlaMint/blob/f2918caa0858f7b986723c12a562ad79a506ed92/Scripts/parlamint2release.xsl#L375-L386

TomazErjavec commented 4 months ago

This is now fixed, also, person/sex is made obligatory, so that XML validation will show if any are missing.