clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

FR feedback #574

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

date in title

https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/ParlaMint-FR.xml#L8-L9

        <title type="sub" xml:lang="fr">Comptes-rendus des débats en séance publique de l'Assemblée Nationale, 15e législature (2017 - 2017)</title>
        <title type="sub" xml:lang="en">Proceedings of the debates in plenary sitting of the Assemblée Nationale, 15th legislature (2017 - 2017)</title>

corpus terms

        <meeting n="15-lower" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term">15e législature</meeting>

setting in root should contain whole corpus period

https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/ParlaMint-FR.xml#L390-L396

      <settingDesc>
        <setting>
          <name type="city">Paris</name>
          <name type="country" key="FR">France</name>
          <date from="2017-09-25" to="2017-10-31">25/09/2017 - 31/10/2017</date>
        </setting>
      </settingDesc>

wrong dates in subcorpus taxonomy

https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/ParlaMint-FR.xml#L367-L386

        <taxonomy xml:id="subcorpus">
          <desc xml:lang="fr">
            <term>Sous-corpus</term>
          </desc>
          <desc xml:lang="en">
            <term>Subcorpora</term>
          </desc>
          <category xml:id="reference">
            <catDesc xml:lang="fr">
              <term>Référence</term>: sous-corpus de référence, jusqu'à 2019-07-31</catDesc>
            <catDesc xml:lang="en">
              <term>Reference</term>: reference subcorpus, until 2019-07-31</catDesc>
          </category>
          <category xml:id="covid">
            <catDesc xml:lang="fr">
              <term>COVID</term>: sous-corpus COVID, à partir de 2019-10-01</catDesc>
            <catDesc xml:lang="en">
              <term>COVID</term>: COVID subcorpus, from 2019-10-01 onwards</catDesc>
          </category>
        </taxonomy>

see: https://github.com/clarin-eric/ParlaMint/blob/5deaeed5ae792f3ba1726072298885b5b64a6d64/Data/ParlaMint-AT/ParlaMint-taxonomy-subcorpus.xml#L2-L16

wrong government events from dates

https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/ParlaMint-FR.xml#L408-L421 Every event starts on "1959-01-09"

merge repeated organizations

multiple organizations are repeated, eg: 2 times: Ministère de l'intérieur 2 times: Ministère de l’intérieur

          <org xml:id="PO729937" role="ministry">
            <orgName full="yes" xml:lang="fr">Ministère de l’intérieur</orgName>
            <orgName full="abb">INT</orgName>
            <event from="2017-06-22" to="2018-10-16">
              <label xml:lang="en">existence</label>
            </event>
          </org>
          <org xml:id="PO730004" role="ministry">
            <orgName full="yes" xml:lang="fr">Ministère de l'intérieur</orgName>
            <orgName full="abb">INT</orgName>
            <event from="2017-06-22" to="2018-10-16">
              <label xml:lang="en">existence</label>
            </event>
          </org>
          <org xml:id="PO759777" role="ministry">
            <orgName full="yes" xml:lang="fr">Ministère de l'intérieur</orgName>
            <orgName full="abb">INT</orgName>
            <event from="2018-10-17" to="2020-07-06">
              <label xml:lang="en">existence</label>
            </event>
          </org>
          <org xml:id="PO773384" role="ministry">
            <orgName full="yes" xml:lang="fr">Ministère de l’intérieur</orgName>
            <orgName full="abb">INT</orgName>
            <event from="2020-07-07">
              <label xml:lang="en">existence</label>
            </event>
          </org>

non-attached member affiliation role

xml-model in component file preambule

https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/2017-18/ParlaMint-FR_2017-09-25-E2002.xml#L2

<?xml-model href="https://raw.githubusercontent.com/clarin-eric/ParlaMint/main/Schema/ParlaMint-TEI.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>

unique main title

I am not sure if this main title is unique among whole corpus, you can append date to make it unique: https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/2017-18/ParlaMint-FR_2017-09-25-E2002.xml#L12-L13

            <title type="main" xml:lang="fr">Corpus parlementaire français ParlaMint-FR, deuxième session extraordinaire [ParlaMint]</title>
            <title type="main" xml:lang="en">French parliamentary corpus ParlaMint-FR, second extraordinary session [ParlaMint]</title>

use chair when chair is speaking

You have added new speaker roles. For instance, I have no idea what speaker means - sometimes it looks like a regular, sometimes a chair. Now we support these "roles": chair, regular, guest. In v3.1 we plan to unify common taxonomies, which can raise problems.

https://github.com/gclux/ParlaMint/blob/0d76f9ca9c02e85a2e8744ff69b709a46c7a90d2/Data/ParlaMint-FR/2017-18/ParlaMint-FR_2017-09-25-E2002.xml#L103

            <u ana="#chair"
               who="#PA720746"
               xml:id="ParlaMint-FR_2017-09-25-E2002.u1">
               <seg>L’ordre du jour appelle la suite de la discussion du projet de loi renforçant la sécurité intérieure et la lutte contre le terrorisme (n° 104, 164, 161).</seg>
            </u>
<!-- the same person as in prevous u, should be #chair -->
            <u ana="#speaker"
               who="#PA720746"
               xml:id="ParlaMint-FR_2017-09-25-E2002.u2">
               <seg>La parole est à M. le ministre d’État, ministre de l’intérieur.</seg>
            </u>

missing join right in articles

eg: La parole est à M. le ministre d’État, ministre de l’intérieur.

<s xml:id="ParlaMint-FR_2017-09-25-E2002.s3">
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w1" msd="UPosTag=DET|Definite=Def|Gender=Fem|Number=Sing|PronType=Art" lemma="le">La</w>
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w2" msd="UPosTag=NOUN|Gender=Fem|Number=Sing" lemma="parole">parole</w>
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w3" msd="UPosTag=AUX|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin" lemma="être">est</w>
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w4" msd="UPosTag=ADP" lemma="à">à</w>
  <name type="PER">
    <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w5" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="Monsieur">M.</w>
    <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w6" msd="UPosTag=DET|Definite=Def|Gender=Masc|Number=Sing|PronType=Art" lemma="le">le</w>
    <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w7" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="ministre">ministre</w>
<!-- missing join="right": -->
    <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w8" msd="UPosTag=ADP" lemma="de">d'</w>
    <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w9" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="état" join="right">État</w>
  </name>
  <pc xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w10" msd="UPosTag=PUNCT">,</pc>
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w11" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="ministre">ministre</w>
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w12" msd="UPosTag=ADP" lemma="de">de</w>
<!-- missing join="right": -->
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w13" msd="UPosTag=DET|Definite=Def|Number=Sing|PronType=Art" lemma="le">l'</w>
  <w xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w14" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="intérieur" join="right">intérieur</w>
  <pc xml:id="ParlaMint-FR_2017-09-25-E2002.s3.w15" msd="UPosTag=PUNCT">.</pc>
  <linkGrp type="UD-SYN" targFunc="head argument">
    <!-- ... -->
  </linkGrp>
</s>

missing terms in component files

https://github.com/gclux/ParlaMint/blob/2885b85ffe1403ffa74f005b43b218ef431069bd/Data/ParlaMint-FR/2022/ParlaMint-FR_2022-03-23-O1168.xml#L14-L21

            <meeting n="O1"
                     corresp="#ParlaMint-FR-LOWER"
                     ana="#parla.session #ParlaMint-FR-LOWER"
                     xml:lang="fr">Session ordinaire 2021-2022 (CRSANR5L15S2022O1N168)</meeting>
            <meeting n="168"
                     corresp="#ParlaMint-FR-LOWER"
                     ana="#parla.sitting #ParlaMint-FR-LOWER"
                     xml:lang="fr">168. séance</meeting>

should be extended with term information - add this line (copied from root file):

<meeting n="16-lower" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #parla.legis.16">16e législature</meeting>

Volodymyr Zelenskyy should be a guest - definitely not unknown

https://github.com/gclux/ParlaMint/blob/2885b85ffe1403ffa74f005b43b218ef431069bd/Data/ParlaMint-FR/2022/ParlaMint-FR_2022-03-23-O1168.xml#L135

<!-- ... -->
               <seg>Monsieur le président de l’Ukraine, vous avez la parole.</seg>
            </u>
            <u xml:id="ParlaMint-FR_2022-03-23-O1168.u3" ana="#unknown">
               <seg>Merci. C’est un grand honneur pour moi, pour l’Ukraine et pour notre peuple. Mesdames et messieurs les sénateurs, mesdames et messieurs les députés, élus de Paris, peuple français, je suis reconnaissant de l’honneur qui m’est fait de m’adresser à vous aujourd’hui. Je suis sûr que vous savez très bien ce qui se passe en Ukraine ; vous savez pourquoi cela se produit et vous savez qui est coupable – y compris ceux qui se cachent la tête dans le sable et essaient de trouver de l’argent en Russie.</seg>
               <seg>Je m’adresse à vous, qui êtes des gens honnêtes, rationnels et audacieux, pour vous poser une question : comment arrêter cette guerre ? Comme instaurer la paix en Ukraine ? La plupart des réponses sont dans vos mains, dans nos mains. Le 9 mars dernier, des bombes aériennes russes ont été lancées sur l’hôpital pour enfants et une maternité de notre ville de Marioupol, une ville paisible du sud de l’Ukraine. Oui, c’était une ville complètement paisible jusqu’à l’arrivée des troupes russes, qui l’ont soumise à un siège brutal, moyenâgeux, et ont commencé à tuer des gens. Dans cette maternité sur laquelle les Russes ont lancé des bombes, il y avait notamment des femmes qui se préparaient à accoucher. La plupart d’entre elles ont survécu, mais certaines ont été grièvement blessées : une femme a vu son pied, qui était fracturé, être amputé ; une autre a eu le bassin fracturé et son bébé est mort avant la naissance. Alors qu’on essayait de la sauver, elle demandait aux médecins de la laisser mourir, de ne pas l’aider : elle ne voyait pas de raisons de rester en vie. Elle est morte.</seg>
               <seg>En Ukraine, en Europe, en 2022, pour des centaines de millions de personnes, il était impensable que le monde puisse être détruit. Je vous demande d’observer une minute de silence en l’honneur et à la mémoire des milliers d’Ukrainiennes et d’Ukrainiens qui ont été tués à la suite de l’invasion russe du territoire ukrainien.<incident type="action">
                     <desc xml:lang="fr">Mmes et MM. les députés se lèvent et observent une minute de silence</desc>
                  </incident>
               </seg>
               <seg>Merci. Après des semaines d’invasion russe, Marioupol et d’autres villes ukrainiennes frappées par l’occupant rappellent les ruines de Verdun. Comme sur les photos de la première guerre mondiale, que chacun a eu l’occasion de voir, l’armée russe ne distingue pas les objets qu’elle cible. Elle détruit tout : quartiers résidentiels, hôpitaux, écoles, universités ; tout. Elle brûle les entrepôts de nourriture et de médicaments : elle brûle tout. Elle ne tient pas compte du concept de crime de guerre ni des obligations liées aux conventions internationales. Elle a apporté la terreur sur le sol ukrainien, et chacun de vous en est conscient. Vous avez toutes les informations, car tous les faits sont disponibles : les femmes violées par les militaires russes dans les zones temporairement occupées, les réfugiés qu’ils tuent sur les routes, les journalistes qu’ils tuent aussi tout en sachant que ce sont des journalistes, sans compter les personnes âgées qui ont survécu à l’holocauste et qui sont désormais obligées de fuir les frappes russes dans des abris antibombes. Ce qui se passe en Ukraine, l’Europe ne l’avait pas vu depuis quatre-vingts ans. Des gens désespérés supplient pour mourir.</seg>
               <seg>En 2019, quand je suis devenu président, il existait déjà un cadre de négociation avec la fédération de Russie : le format Normandie. Il devait mettre fin à la guerre dans le Donbass, à l’est de l’Ukraine, qui dure malheureusement depuis huit ans. Quatre États ont participé au format Normandie – l’Ukraine, la Russie, l’Allemagne et la France –, mais ils représentaient l’ensemble du monde, ils exprimaient toutes les positions des pays du monde entier. Certains ont soutenu ce processus tandis que d’autres essayaient de le retarder, voulant le perturber. Mais il semblait important qu’un tel cadre continue d’exister. En 2019, les négociations ont donné des résultats – nous avons réussi à libérer des personnes gardées en captivité et à négocier certaines décisions –, ce qui a constitué une bouffée d’air frais ou comme une lueur d’espoir, l’espoir que les conversations avec la Russie puissent être constructives, que les dirigeants de la Russie puissent être convaincus par nos paroles, que Moscou puisse choisir la paix.</seg>
               <seg>Mais le 24 février 2022 est arrivé. Ce jour a effacé tous les efforts consentis ; il a brisé le concept même de dialogue et l’expérience européenne des relations avec la Russie ; il a infléchi les destinées de l’histoire européenne. Tout cela a été bombardé par les troupes russes, écrasé par l’artillerie russe et brûlé par les tirs de missiles russes. N’ayant pu trouver la vérité dans les bureaux, nous sommes obligés de la chercher sur le champ de bataille. Alors, que nous reste-t-il ? Nos valeurs, notre unité et notre détermination à défendre notre liberté, notre liberté commune, celle de Paris et de Kiev, de Berlin et de Varsovie, de Madrid et de Rome, de Bruxelles et de Bratislava. Les bouffées d’air frais ne nous aideront pas. Nous devons agir ensemble, faire pression ensemble sur la Russie pour l’inciter à chercher la paix.</seg>
            </u>

And the rest of the speech is wrongly attributed to PA720124(Aude Amadou)

https://github.com/gclux/ParlaMint/blob/2885b85ffe1403ffa74f005b43b218ef431069bd/Data/ParlaMint-FR/2022/ParlaMint-FR_2022-03-23-O1168.xml#L146-L160

            <u who="#PA720124"
               xml:id="ParlaMint-FR_2022-03-23-O1168.u4"
               ana="#regular">
               <seg>Mesdames et messieurs, peuple français, le 24 février, le peuple ukrainien s’est uni. Désormais, nous n’avons plus ni droite ni gauche, nous ne distinguons plus entre les représentants du pouvoir et ceux des coalitions de l’opposition ; nous ne pensons qu’à instaurer la paix pour protéger notre pays. Nous sommes reconnaissants à la France pour son aide et pour les efforts du Président de la République Emmanuel Macron, qui a fait preuve d’un véritable leadership. Nous communiquons constamment avec lui et coordonnons nos actions. Les Ukrainiens voient que la France apprécie et protège la vérité. Vous savez ce que sont la liberté, l’égalité et la fraternité. Chacune de ces notions est importante pour vous : je le sens et les Ukrainiens le ressentent aussi.</seg>
               <seg>Nous attendons de la France, de votre leadership, que vous conduisiez la Russie à rechercher la paix, pour mettre fin à cette guerre contre la liberté, l’égalité et la fraternité, contre tout ce qui a rendu l’Europe unie, libre et diverse. Nous attendons de la France, de votre leadership, la restauration de l’intégrité territoriale de l’Ukraine. Nous pouvons le faire ensemble. Si certains parmi vous en doutent, votre peuple, lui, en est sûr, comme tous les autres peuples d’Europe. Sous la présidence française du Conseil de l’Union européenne, une décision mûrie sera prise en faveur de l’adhésion de l’Ukraine à l’Union européenne. Ce sera une décision historique, prise à un moment historique, comme cela fut toujours le cas dans l’histoire du peuple français.</seg>
               <seg>Mesdames et messieurs, peuple français, demain cela fera un mois que les Ukrainiens se battent pour leur vie et leur liberté, que notre armée s’oppose héroïquement aux forces russes, pourtant supérieures. Nous avons besoin d’encore plus d’aide et de soutien. Pour que la liberté ne perde pas, elle doit être bien armée. Les chars, les armes antichars, les avions de combat, la défense aérienne : nous en avons besoin et vous pouvez nous aider. Pour que la liberté ne perde pas, le monde doit aussi la soutenir avec des sanctions contre l’agresseur. Chaque semaine, il faut prendre un nouveau paquet de sanctions. Les entreprises françaises doivent quitter le marché russe : Renault, Auchan, Leroy Merlin et tous les autres groupes doivent cesser d’être les sponsors de la machine de guerre russe. Ils doivent cesser de financer le meurtre de femmes et d’enfants, ou le viol. Chacun doit se rappeler que les valeurs passent avant les bénéfices.<kinesic type="applause">
                     <desc xml:lang="fr">Applaudissements sur quelques bancs des groupes LaREM, Dem et SOC ainsi que parmi certains députés non inscrits</desc>
                  </kinesic>
               </seg>
               <seg>Nous devons déjà penser à l’avenir, à la façon dont nous allons vivre après la guerre. Il nous faut des garanties solides pour rendre la sécurité inébranlable et les guerres impossibles dans ce monde. Créons un nouveau système de garanties et de sécurité, au sein duquel la France jouera un rôle de premier plan, pour que personne n’ait plus à pleurer la mort, pour que les gens puissent vivre leur vie et mourir non pas sous les bombes, au milieu d’une guerre, mais quand leur heure est venue, dans la dignité. Chacun doit vivre dans le respect, et on doit pouvoir lui dire « adieu », comme la France l’a dit à Jean-Paul Belmondo.</seg>
               <seg>Merci, la France ! Gloire à l’Ukraine !<kinesic type="applause">
                     <desc xml:lang="fr">Mmes et MM. les députés se lèvent et applaudissent longuement</desc>
                  </kinesic>
               </seg>
            </u>

terms should be in parliament organization

Term should be an event in parliament not category in taxonomy.

This should be removed https://github.com/gclux/ParlaMint/blob/2885b85ffe1403ffa74f005b43b218ef431069bd/Data/ParlaMint-FR/ParlaMint-FR.xml#L216-L225

                     <category xml:id="parla.legis.15">
                        <catDesc xml:lang="fr">
                           <term>15e législature</term>
                        </catDesc>
                     </category>
                     <category xml:id="parla.legis.16">
                        <catDesc xml:lang="fr">
                           <term>16e législature</term>
                        </catDesc>
                     </category>

And correct terms should be added to meeting element

gclux commented 1 year ago

About these specific issues in this list, can you please clarify what is wrong in:

matyaskopp commented 1 year ago

About these specific issues in this list, can you please clarify what is wrong in:

  • datespan in title
  • corpus should contain multiple terms ? About the other issues, some are easy to fix: I can update the GitHub samples starting from next Monday. Others require more work.

The period of corpus is huger than 2017-2017. The reason for this 2017-2017 span is probably that your sample contains only a small period. It is better to use one file from each term in the sample because the sample will be more informative, and it can happen that older transcripts can contain some phenomena that don't appear in the newest ones.

TomazErjavec commented 1 year ago

I should add that even for sample, the corpus roots should be as they are for the full corpus, i.e. the dates that the corpus covers, the complete list of person and organisations, etc.

gclux commented 1 year ago

I have updated the samples in GitHub. However, I run into problems with validation on my side... Does 'make validate-parlamint-FR' work on the 'ana' versions? 'make val-schema-ana-FR' gives weird errors... ../ParlaMint/Data/TMP/ParlaMint-FR/ParlaMint-FR.ana.xml:2:1412: error: ID "ParlaMint-FR_2017-06-28-O1125.d1_1" has already been defined

matyaskopp commented 1 year ago

Your ids are not unique because you are importing TEI version into TEI.ana version: https://github.com/gclux/ParlaMint/blob/712a9fd76d66fe0f896b67c871fc303124d5eb90/Data/ParlaMint-FR/ParlaMint-FR.ana.xml#L16739-L16750

  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2017/ParlaMint-FR_2017-06-28-O1125.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2018/ParlaMint-FR_2018-01-16-O1111.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2019/ParlaMint-FR_2019-09-10-E2001.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2020/ParlaMint-FR_2020-01-07-O1114.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2021/ParlaMint-FR_2021-01-12-O1125.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2022/ParlaMint-FR_2022-03-23-O1168.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2017/ParlaMint-FR_2017-06-28-O1125.ana.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2018/ParlaMint-FR_2018-01-16-O1111.ana.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2019/ParlaMint-FR_2019-09-10-E2001.ana.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2020/ParlaMint-FR_2020-01-07-O1114.ana.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2021/ParlaMint-FR_2021-01-12-O1125.ana.xml"/>
  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2022/ParlaMint-FR_2022-03-23-O1168.ana.xml"/>
gclux commented 1 year ago

Yes. Just updated. But, there is a problem in my validation script...

Scripts/validate-parlamint.pl Schema 'Data/ParlaMint-FR'
INFO: Validating directory /home/glux/Work/GitHub/ParlaMint/Data/ParlaMint-FR
INFO: Validating TEI root /home/glux/Work/GitHub/ParlaMint/Data/ParlaMint-FR/ParlaMint-FR.xml
INFO: Char validation for ParlaMint-FR.xml
Died at Scripts/validate-parlamint.pl line 55, <IN> chunk 1.
make: *** [Makefile:204: validate-parlamint-FR] Error 255
TomazErjavec commented 1 year ago

But, there is a problem in my validation script Died at Scripts/validate-parlamint.pl line 55

Oh dear, this is my faut, sorry. In the devel banch there was an explicit "die" in this script, which was there for some testing purposes. I removed it now.

matyaskopp commented 1 year ago

@gclux I have updated https://github.com/clarin-eric/ParlaMint/issues/574#issue-1515142559 to reflect this status: https://github.com/gclux/ParlaMint/commit/2885b85ffe1403ffa74f005b43b218ef431069bd

gclux commented 1 year ago

About... terms should be in parliament organization and.. missing terms in component files ...I apparently misunderstood the encoding. Better late than never!

OK, I can remove the terms from the taxonomy and use the org/@xml:id. This will give in the root file:

        <meeting n="168" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #PO717460">15e législature</meeting>
        <meeting n="2" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #PO791932">16e législature</meeting>

...and in the last component file of the 15th term:

            <meeting n="O1"
                     corresp="#ParlaMint-FR-LOWER"
                     ana="#parla.session #ParlaMint-FR-LOWER"
                     xml:lang="fr">Session ordinaire 2021-2022 (CRSANR5L15S2022O1N168)</meeting>
            <meeting n="168"
                     corresp="#ParlaMint-FR-LOWER"
                     ana="#parla.sitting #ParlaMint-FR-LOWER #PO717460"
                     xml:lang="fr">168. séance</meeting>    

Do we agree?

matyaskopp commented 1 year ago

OK, I can remove the terms from the taxonomy and use the org/@xml:id. This will give in the root file:

Use org/eventList/event/@xml:id

        <meeting n="168" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #PO717460">15e législature</meeting>
        <meeting n="2" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #PO791932">16e législature</meeting>

Root file should contain (I hope the ids refer to the correct events...):

<meeting n="15" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #PO717460"  xml:lang="fr">15e législature</meeting>
<meeting n="16" corresp="#ParlaMint-FR-LOWER" ana="#parla.national #parla.lower #parla.term #PO791932"  xml:lang="fr">16e législature</meeting>

...and in the last component file of the 15th term:

            <meeting n="O1"
                     corresp="#ParlaMint-FR-LOWER"
                     ana="#parla.session #ParlaMint-FR-LOWER"
                     xml:lang="fr">Session ordinaire 2021-2022 (CRSANR5L15S2022O1N168)</meeting>
            <meeting n="168"
                     corresp="#ParlaMint-FR-LOWER"
                     ana="#parla.sitting #ParlaMint-FR-LOWER #PO717460"
                     xml:lang="fr">168. séance</meeting>    
             <meeting n="15" 
                      corresp="#ParlaMint-FR-LOWER"
                      ana="#parla.national #parla.lower #parla.term #PO717460"
                       xml:lang="fr">15e législature</meeting>
             <meeting n="O1"
                      corresp="#ParlaMint-FR-LOWER"
                      ana="#parla.session #ParlaMint-FR-LOWER"
                      xml:lang="fr">Session ordinaire 2021-2022 (CRSANR5L15S2022O1N168)</meeting>
             <meeting n="168"
                      corresp="#ParlaMint-FR-LOWER"
                      ana="#parla.sitting #ParlaMint-FR-LOWER"
                      xml:lang="fr">168. séance</meeting>  
gclux commented 1 year ago

OK. I will now use more readable ids for the term...

             <meeting n="15" 
                      corresp="#ParlaMint-FR-LOWER"
                      ana="#parla.national #parla.lower #parla.term #parla.term.16"
                       xml:lang="fr">15e législature</meeting>
             <meeting n="O1"
                      corresp="#ParlaMint-FR-LOWER"
                      ana="#parla.session #ParlaMint-FR-LOWER"
                      xml:lang="fr">Session ordinaire 2021-2022 (CRSANR5L15S2022O1N168)</meeting>
             <meeting n="168"
                      corresp="#ParlaMint-FR-LOWER"
                      ana="#parla.sitting #ParlaMint-FR-LOWER"
                      xml:lang="fr">168. séance</meeting>  

I think that the documentation may be improved here... https://clarin-eric.github.io/ParlaMint/#sec-titleStmt As opposed to the given example, in France we have three levels of meeting description: term - session - sitting I will try to use 'term', which is the correct translation of the French 'législature'.

matyaskopp commented 1 year ago

@gclux thanks for updating the sample, It is great that you were able to fix the quest speakers

I have updated the ticks., there are two ones unticked:

gclux commented 1 year ago

About: merge repeated organizations

It is not an error. This is the case of a ministry shared by two ministers: https://en.wikipedia.org/wiki/Jacqueline_Gourault

...she previously served as Minister attached to the Minister of the Interior from 2017 to 2018.

I think the best would be to manually "patch" the second organization...

          <org xml:id="PO729937" role="ministry">
            <orgName full="yes" xml:lang="fr">Ministère de l’intérieur</orgName>
            <orgName full="abb">INT</orgName>
            <event from="2017-06-22" to="2018-10-16">
              <label xml:lang="en">existence</label>
            </event>
          </org>
          <org xml:id="PO730004" role="ministry">
            <orgName full="yes" xml:lang="fr">Ministère auprès du ministre d'État, ministre de l'intérieur</orgName>
            <orgName full="abb">INT</orgName>
            <event from="2017-06-22" to="2018-10-16">
              <label xml:lang="en">existence</label>
            </event>
          </org>

Have you found any other cases of such duplications?

matyaskopp commented 1 year ago

Have you found any other cases of such duplications?

no, I've overlooked - only one such organization

gclux commented 1 year ago

About: unique main title

I did not know the title had to be unique! ... I am surprised this errors shows up now!!!

I can copy the ", séance : 2, 25/09/2017" from the subtitle (I believe there may be several sittings in the same day).. This mention would then be redundant.

Is it also the case for the subtitle? (to be unique)

Alternatively, I can take the unique id from the source file.

But his would imply a new recompilation!!!

matyaskopp commented 1 year ago

I did not know the title had to be unique! ... I am surprised this errors shows up now!!!

Mention about unique title is in documentation (https://clarin-eric.github.io/ParlaMint/#exa-titleStmtComp):

In the example it can be seen that the main title of a corpus component is simply an extension of the corpus root title, as it also gives the name of the particular meeting that the component contains, while the subordinate title is, again, free text. Both titles must be unique in the complete corpus.

but it is not in the validation script. Copying values from subtitles seems ok to me.

@TomazErjavec do we insist on this, it is your requirement and I am not sure where it came from (inherited from TEI recommendations?)

matyaskopp commented 1 year ago

a unique title and duplicated organization seem to be fixed in data delivered to @TomazErjavec, so closing this issue and merging the sample (will be fixed/overwritten by ParlaMint v3.0 sample)