clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
43 stars 53 forks source link

ES-CT feedback #491

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

@rjzevallos

Title is not corresponding to meeting values

corpus contains 4 terms (according to meeting elements content) but title admits only term XI and XII

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT.xml#L10-L15

            <title type="sub" xml:lang="ca">Actes del Parlament de Catalunya, legislatures XI-XII (2015 - 2022)</title>
            <title type="sub" xml:lang="en">Minutes from  the Parliament of Catalonia, terms XI XII (2015 - 2022)</title>
            <meeting n="11" ana="#parla.term #PC.11">Term 11</meeting>
            <meeting n="12" ana="#parla.term #PC.12">Term 12</meeting>
            <meeting n="13" ana="#parla.term #PC.13">Term 13</meeting>
            <meeting n="14" ana="#parla.term #PC.14">Term 14</meeting>

teiCorpus meeting element

meeting should contain parla.uni, eg

<meeting n="11" ana="#parla.term #parla.uni #PC.11">Term 11</meeting>

meeting in component file

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L8

<meeting ana="#parla.term #PC.12" n="12">XII Legislatura</meeting>

the meeting element should also contain session/meeting/sitting if make sense. eg: https://github.com/clarin-eric/ParlaMint/blob/e37537be54721c40bf6687cd12d9361759e6b234/Data/ParlaMint-PT/ParlaMint-PT_2015-01-07.xml#L13-L16

missing component file classification

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L2

<TEI xmlns="http://www.tei-c.org/ns/1.0" 
     xml:lang="ca" 
     xml:id="ParlaMint-ES-CT_2018-01-17-0101" 
     ana="#reference">

should be sth like:

<TEI xmlns="http://www.tei-c.org/ns/1.0" 
     xml:lang="ca" 
     xml:id="ParlaMint-ES-CT_2018-01-17-0101" 
     ana="#reference #parla.sitting">

attribute scheme should refer to a taxonomy

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT.xml#L131

<catRef scheme="#PC" target="#parla.uni"/>

should be

<catRef scheme="#ParlaMint-taxonomy-parla.legislature" target="#parla.uni"/>

missing funder in component file

wrong settingdate

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L89

<date when="2018-01-17">26.10.2015</date>

missing chair speaker

opposition relation

split forename and surname

nuriabel commented 1 year ago

Hi, thanks a lot for the feedback. We will correct the different issues. We have a doubt, however, with the 'split forename and surname'. Can we have different surnames and forenames? in any quantity? Should the 'i' (and in Catalan) be encoded as a as the 'de' (of)? That is: is this good?

<forename>Lucas</forename>
<forename>Silvano</forename>
<surname>Ferro</surname>
<surname>Solé</surname>

Thanks!! Núria

TomazErjavec commented 1 year ago

Can we have different surnames and forenames? in any quantity?

Yes, as many as you want!

Should the 'i' (and in Catalan) be encoded as a as the 'de' (of)?

Absolutely. It's even in the example of the Guidelines.

rjzevallos commented 1 year ago

Done!

matyaskopp commented 1 year ago

@rjzevallos, thanks for the changes. I reviewed your data and added feedback for an annotated version.

coalition relation

documented here: https://clarin-eric.github.io/ParlaMint/#sec-relation

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT-listOrg.xml#L215-L220

      <relation name="coalition"
                active="#PG.JxCAT #PG.REP"
                passive="#GOV"
                from="2018-01-17"
                to="2020-12-21"
                ana="#PC.12"/>

should be

      <relation name="coalition"
                mutual="#PG.JxCAT #PG.REP"
                from="2018-01-17"
                to="2020-12-21"
                ana="#PC.12"/>

repeated notes

I hope this is the source of your data: https://www.parlament.cat/document/dspcp/239595.pdf#page=3

your data: https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L100-L105

            <note type="narrative">La sessió s'obre a les onze del matí i dos minuts. Presideix el president de la Mesa d’Edat, acompanyat dels secretaris de la Mesa d’Edat, la qual és assistida pel secretari general i el lletrat major.</note>
            <note type="narrative">La sessió s'obre a les onze del matí i dos minuts. Presideix el president de la Mesa d’Edat, acompanyat dels secretaris de la Mesa d’Edat, la qual és assistida pel secretari general i el lletrat major.</note>
            <note type="narrative">ORDRE DEL DIA DE LA CONVOCATÒRIA</note>
            <note type="narrative">ORDRE DEL DIA DE LA CONVOCATÒRIA</note>
            <note type="narrative">Punt únic: Constitució del Ple del Parlament i elecció de la Mesa del Parlament (tram. 396-00001/12 i 398-00001/12).</note>
            <note type="narrative">Punt únic: Constitució del Ple del Parlament i elecció de la Mesa del Parlament (tram. 396-00001/12 i 398-00001/12).</note>

Source: image

mention the proper source in bibl

You can add the proper source of a file into the bibl element in this place: https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L58

<idno type="URI" subtype="parliament">https://www.parlament.cat/document/dspcp/239595.pdf</idno>

missing join

In the annotated component file are only 4 join="right". https://clarin-eric.github.io/ParlaMint/#sec-ana-words

<u xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0" who="#MuroXavier" ana="#regular" xml:lang="ca">
  <seg xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0" xml:lang="ca">
    <s xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1" xml:lang="ca">
      <w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ" lemma="bo">Bon</w>
      <w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN" lemma="dia">dia</w>
      <w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.3" msd="UPosTag=ADP" lemma="a">a</w>
<!-- missing join: -->
      <w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.4" msd="UPosTag=PRON" lemma="tothom">tothom</w>
      <pc xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.5" msd="UPosTag=PUNCT">,</pc>
...

missing UD features

<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN" lemma="dia">dia</w>

should be (according to udpipe)

<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ|Gender=Masc|Number=Sing" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="dia">dia</w>

CNEC prefix

https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT.ana.xml#L137-L139

missing taxonomy translation (needed in ParlaMint v3.1)

matyaskopp commented 1 year ago

@rjzevallos What branch in your fork do you want to use? You have some commits in IULATERM-TRL-UPF:main and the rest in IULATERM-TRL-UPF:data-ES-CT ? none of these branches contains valid data, and they are wrong in different ways:

main - the @msd should contain UD features

<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=adj|type=qualificative|gen=masculine|num=singular" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=noun|type=common|gen=masculine|num=singular" lemma="dia">dia</w>

data-ES-CT - missing UD features

<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN" lemma="dia">dia</w>
rjzevallos commented 1 year ago

We are using main-branch. Moreover, on Wednesday I made a new pull-requests solving the missing part.

rjzevallos commented 1 year ago

I see that I need to fix the UD features.

rjzevallos commented 1 year ago

I have a question. Which script should I use to validate UD features? I use the follow commands:

make val-schema-ES-CT make val-schema-tei-ES-CT make val-schema-ana-ES-CT make val-schema-ParlaMint-ES-CT make val-schema-ParlaCLARIN-ES-CT make val-schema-ana-ParlaMint-ES-CT make val-schema-tei-ParlaCLARIN-ES-CT make val-schema-ana-ParlaCLARIN-ES-CT make check-links-ES-CT make check-content-ES-CT make validate-parlamint-ES-CT

when I run all the commands I don't get any error =S

matyaskopp commented 1 year ago

I have a question. Which script should I use to validate UD features?

Use this:

make conllu-ES-CT

command is doing the following

rjzevallos commented 1 year ago

when I run make conllu-ES-CT

I get:

python3: can't open file '/mnt/d/UPF/proyecto_parlamint/ParlaMint/Scripts/tools/validate.py': [Errno 2] No such file or directory

I don't have tools folder.

TomazErjavec commented 1 year ago
  1. run make
  2. tells you how to check for prerequisites
  3. you will just need to clone conll-u checking into toos
matyaskopp commented 1 year ago

when I run make conllu-ES-CT

I get:

python3: can't open file '/mnt/d/UPF/proyecto_parlamint/ParlaMint/Scripts/tools/validate.py': [Errno 2] No such file or directory

I don't have tools folder.

You have to clone it from UD tools repository. Follow these instructions: CONTRIBUTING.md - UD tools

matyaskopp commented 1 year ago

Thanks for the fixtures. Two (final) observations:

not matching setting date in TEI.ana version

https://raw.githubusercontent.com/IULATERM-TRL-UPF/ParlaMint/main/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.ana.xml

<date when="2018-01-17">26.10.2015</date>

used model(s)

From the application description, it seems that you have used Freeling with the Catalan model for the whole corpus, even for Spanish parts. Am I right, or does the application description miss a mention of the Spanish model?

nuriabel commented 1 year ago

Concerning this mention the proper source in bibl

proper source of component file in bibl You can add the proper source of a file into the bibl element in this place: https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L58

https://www.parlament.cat/document/dspcp/239595.pdf

I explained at the documentation that the source were docx documents directly sent to us. I can include a note saying that the texts are also available in pdf format, but they cannot be considered as source files, because there is no one to one correspondence. That is the reason we will not use the bibl element. Thanks again for all your support!! Best N.

matyaskopp commented 1 year ago

I am sorry, I missed this previously:

parliamentaryGroup with no affiliation

Your parliamentary groups do not have members:

  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.PSCUA
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.ERC
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.JxCAT
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.VOX
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CUP
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.ECP
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.Cs
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.GM
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.JxSi
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.PSC
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CSP
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.PPC
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CUP-CC
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.REP
  WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CCP
matyaskopp commented 1 year ago

@rjzevallos, @nuriabel, is there any progress? I would like to close this issue and merge your sample, but there is still an unsolved task: https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1363306982

nuriabel commented 1 year ago

Hi Matyas!! happy new year!! Yes, we are almost there with all the information already collected, but now we are busy until 20-January because of a deadline. Is there any problem if we wait until the week of 23 to provide the new files? Best N.

El vie, 16 dic 2022 a las 19:14, Matyáš Kopp @.***>) escribió:

@rjzevallos https://github.com/rjzevallos What branch in your fork do you want to use? You have some commits in IULATERM-TRL-UPF:main https://github.com/IULATERM-TRL-UPF/ParlaMint and the rest in IULATERM-TRL-UPF:data-ES-CT https://github.com/IULATERM-TRL-UPF/ParlaMint/tree/data-ES-CT ? none of these branches contains valid data, and they are wrong in different ways:

main - the @msd should contain UD features

Bon dia

data-ES-CT - missing UD features

Bon dia

— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1355351321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVZIVZAFEAK6WGKV5VLWNSWPBANCNFSM6AAAAAASONBT6M . You are receiving this because you commented.Message ID: @.***>

TomazErjavec commented 1 year ago

Is there any problem if we wait until the week of 23 to provide the new files?

I think if we get them before end of Januray, we are good. And a happy new year to you too!

nuriabel commented 1 year ago

good! thanks!! N.

El sáb, 14 ene 2023 a las 9:37, Tomaž Erjavec @.***>) escribió:

Is there any problem if we wait until the week of 23 to provide the new files?

I think if we get them before end of Januray, we are good. And a happy new year to you too!

— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1382692202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVYA5ZNXNNGPFE46JPDWSJQS5ANCNFSM6AAAAAASONBT6M . You are receiving this because you were mentioned.Message ID: @.***>

nuriabel commented 1 year ago

Hi Matyas,

We've fixed the warnings that you mentioned. Hope all is well.

Best,

El vie, 13 ene 2023 a las 15:46, BEL, NURIA @.***>) escribió:

Hi Matyas!! happy new year!! Yes, we are almost there with all the information already collected, but now we are busy until 20-January because of a deadline. Is there any problem if we wait until the week of 23 to provide the new files? Best N.

El vie, 16 dic 2022 a las 19:14, Matyáš Kopp @.***>) escribió:

@rjzevallos https://github.com/rjzevallos What branch in your fork do you want to use? You have some commits in IULATERM-TRL-UPF:main https://github.com/IULATERM-TRL-UPF/ParlaMint and the rest in IULATERM-TRL-UPF:data-ES-CT https://github.com/IULATERM-TRL-UPF/ParlaMint/tree/data-ES-CT ? none of these branches contains valid data, and they are wrong in different ways:

main - the @msd should contain UD features

Bon dia

data-ES-CT - missing UD features

Bon dia

— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1355351321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVZIVZAFEAK6WGKV5VLWNSWPBANCNFSM6AAAAAASONBT6M . You are receiving this because you commented.Message ID: @.***>

matyaskopp commented 1 year ago

@nuriabel @rjzevallos Thanks for updating your corpus. I don't see any other issue!