Closed matyaskopp closed 1 year ago
Hi, thanks a lot for the feedback. We will correct the different issues. We have a doubt, however, with the 'split forename and surname'. Can we have different surnames and forenames? in any quantity? Should the 'i' (and in Catalan) be encoded as a
<forename>Lucas</forename>
<forename>Silvano</forename>
<surname>Ferro</surname>
<surname>Solé</surname>
Thanks!! Núria
Can we have different surnames and forenames? in any quantity?
Yes, as many as you want!
Should the 'i' (and in Catalan) be encoded as a as the 'de' (of)?
Absolutely. It's even in the example of the Guidelines.
Done!
@rjzevallos, thanks for the changes. I reviewed your data and added feedback for an annotated version.
documented here: https://clarin-eric.github.io/ParlaMint/#sec-relation
<relation name="coalition"
active="#PG.JxCAT #PG.REP"
passive="#GOV"
from="2018-01-17"
to="2020-12-21"
ana="#PC.12"/>
should be
<relation name="coalition"
mutual="#PG.JxCAT #PG.REP"
from="2018-01-17"
to="2020-12-21"
ana="#PC.12"/>
I hope this is the source of your data: https://www.parlament.cat/document/dspcp/239595.pdf#page=3
<note type="narrative">La sessió s'obre a les onze del matí i dos minuts. Presideix el president de la Mesa d’Edat, acompanyat dels secretaris de la Mesa d’Edat, la qual és assistida pel secretari general i el lletrat major.</note>
<note type="narrative">La sessió s'obre a les onze del matí i dos minuts. Presideix el president de la Mesa d’Edat, acompanyat dels secretaris de la Mesa d’Edat, la qual és assistida pel secretari general i el lletrat major.</note>
<note type="narrative">ORDRE DEL DIA DE LA CONVOCATÒRIA</note>
<note type="narrative">ORDRE DEL DIA DE LA CONVOCATÒRIA</note>
<note type="narrative">Punt únic: Constitució del Ple del Parlament i elecció de la Mesa del Parlament (tram. 396-00001/12 i 398-00001/12).</note>
<note type="narrative">Punt únic: Constitució del Ple del Parlament i elecció de la Mesa del Parlament (tram. 396-00001/12 i 398-00001/12).</note>
Source:
You can add the proper source of a file into the bibl element in this place: https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L58
<idno type="URI" subtype="parliament">https://www.parlament.cat/document/dspcp/239595.pdf</idno>
@join="right"
In the annotated component file are only 4 join="right"
. https://clarin-eric.github.io/ParlaMint/#sec-ana-words
<u xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0" who="#MuroXavier" ana="#regular" xml:lang="ca">
<seg xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0" xml:lang="ca">
<s xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1" xml:lang="ca">
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN" lemma="dia">dia</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.3" msd="UPosTag=ADP" lemma="a">a</w>
<!-- missing join: -->
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.4" msd="UPosTag=PRON" lemma="tothom">tothom</w>
<pc xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.5" msd="UPosTag=PUNCT">,</pc>
...
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN" lemma="dia">dia</w>
should be (according to udpipe)
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ|Gender=Masc|Number=Sing" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" lemma="dia">dia</w>
@rjzevallos What branch in your fork do you want to use? You have some commits in IULATERM-TRL-UPF:main and the rest in IULATERM-TRL-UPF:data-ES-CT ? none of these branches contains valid data, and they are wrong in different ways:
main - the @msd
should contain UD features
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=adj|type=qualificative|gen=masculine|num=singular" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=noun|type=common|gen=masculine|num=singular" lemma="dia">dia</w>
data-ES-CT - missing UD features
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.1" msd="UPosTag=ADJ" lemma="bo">Bon</w>
<w xml:id="ParlaMint-ES-CT_2018-01-17-0101.1.0.0.1.2" msd="UPosTag=NOUN" lemma="dia">dia</w>
We are using main-branch. Moreover, on Wednesday I made a new pull-requests solving the missing part.
I see that I need to fix the UD features.
I have a question. Which script should I use to validate UD features? I use the follow commands:
make val-schema-ES-CT make val-schema-tei-ES-CT make val-schema-ana-ES-CT make val-schema-ParlaMint-ES-CT make val-schema-ParlaCLARIN-ES-CT make val-schema-ana-ParlaMint-ES-CT make val-schema-tei-ParlaCLARIN-ES-CT make val-schema-ana-ParlaCLARIN-ES-CT make check-links-ES-CT make check-content-ES-CT make validate-parlamint-ES-CT
when I run all the commands I don't get any error =S
I have a question. Which script should I use to validate UD features?
Use this:
make conllu-ES-CT
command is doing the following
when I run
make conllu-ES-CT
I get:
python3: can't open file '/mnt/d/UPF/proyecto_parlamint/ParlaMint/Scripts/tools/validate.py': [Errno 2] No such file or directory
I don't have tools folder.
when I run
make conllu-ES-CT
I get:
python3: can't open file '/mnt/d/UPF/proyecto_parlamint/ParlaMint/Scripts/tools/validate.py': [Errno 2] No such file or directory
I don't have tools folder.
You have to clone it from UD tools repository. Follow these instructions: CONTRIBUTING.md - UD tools
Thanks for the fixtures. Two (final) observations:
setting
date in TEI.ana version<date when="2018-01-17">26.10.2015</date>
From the application description, it seems that you have used Freeling with the Catalan model for the whole corpus, even for Spanish parts. Am I right, or does the application description miss a mention of the Spanish model?
Concerning this mention the proper source in bibl
proper source of component file in bibl You can add the proper source of a file into the bibl element in this place: https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/960d8fed8178b5d9b1f2659c60630d2bf2235e02/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L58
I explained at the documentation that the source were docx documents directly sent to us. I can include a note saying that the texts are also available in pdf format, but they cannot be considered as source files, because there is no one to one correspondence. That is the reason we will not use the bibl element. Thanks again for all your support!! Best N.
I am sorry, I missed this previously:
parliamentaryGroup
with no affiliationparliamentaryGroup
affiliationsYour parliamentary groups do not have members:
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.PSCUA
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.ERC
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.JxCAT
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.VOX
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CUP
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.ECP
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.Cs
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.GM
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.JxSi
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.PSC
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CSP
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.PPC
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CUP-CC
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.REP
WARN[10] ParlaMint-ES-CT-listOrg parliamentaryGroup-role organisation without affiliation: #PG.CCP
@rjzevallos, @nuriabel, is there any progress? I would like to close this issue and merge your sample, but there is still an unsolved task: https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1363306982
Hi Matyas!! happy new year!! Yes, we are almost there with all the information already collected, but now we are busy until 20-January because of a deadline. Is there any problem if we wait until the week of 23 to provide the new files? Best N.
El vie, 16 dic 2022 a las 19:14, Matyáš Kopp @.***>) escribió:
@rjzevallos https://github.com/rjzevallos What branch in your fork do you want to use? You have some commits in IULATERM-TRL-UPF:main https://github.com/IULATERM-TRL-UPF/ParlaMint and the rest in IULATERM-TRL-UPF:data-ES-CT https://github.com/IULATERM-TRL-UPF/ParlaMint/tree/data-ES-CT ? none of these branches contains valid data, and they are wrong in different ways:
main - the @msd should contain UD features
Bon dia data-ES-CT - missing UD features
Bon dia — Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1355351321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVZIVZAFEAK6WGKV5VLWNSWPBANCNFSM6AAAAAASONBT6M . You are receiving this because you commented.Message ID: @.***>
Is there any problem if we wait until the week of 23 to provide the new files?
I think if we get them before end of Januray, we are good. And a happy new year to you too!
good! thanks!! N.
El sáb, 14 ene 2023 a las 9:37, Tomaž Erjavec @.***>) escribió:
Is there any problem if we wait until the week of 23 to provide the new files?
I think if we get them before end of Januray, we are good. And a happy new year to you too!
— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1382692202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVYA5ZNXNNGPFE46JPDWSJQS5ANCNFSM6AAAAAASONBT6M . You are receiving this because you were mentioned.Message ID: @.***>
Hi Matyas,
We've fixed the warnings that you mentioned. Hope all is well.
Best,
El vie, 13 ene 2023 a las 15:46, BEL, NURIA @.***>) escribió:
Hi Matyas!! happy new year!! Yes, we are almost there with all the information already collected, but now we are busy until 20-January because of a deadline. Is there any problem if we wait until the week of 23 to provide the new files? Best N.
El vie, 16 dic 2022 a las 19:14, Matyáš Kopp @.***>) escribió:
@rjzevallos https://github.com/rjzevallos What branch in your fork do you want to use? You have some commits in IULATERM-TRL-UPF:main https://github.com/IULATERM-TRL-UPF/ParlaMint and the rest in IULATERM-TRL-UPF:data-ES-CT https://github.com/IULATERM-TRL-UPF/ParlaMint/tree/data-ES-CT ? none of these branches contains valid data, and they are wrong in different ways:
main - the @msd should contain UD features
Bon dia data-ES-CT - missing UD features
Bon dia — Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/491#issuecomment-1355351321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVZIVZAFEAK6WGKV5VLWNSWPBANCNFSM6AAAAAASONBT6M . You are receiving this because you commented.Message ID: @.***>
@nuriabel @rjzevallos Thanks for updating your corpus. I don't see any other issue!
@rjzevallos
Title is not corresponding to meeting values
corpus contains 4 terms (according to
meeting
elements content) but title admits only term XI and XIIhttps://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT.xml#L10-L15
teiCorpus meeting element
parla.uni
(root and component file)meeting should contain
parla.uni
, egmeeting in component file
https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L8
the meeting element should also contain
session/meeting/sitting
if make sense. eg: https://github.com/clarin-eric/ParlaMint/blob/e37537be54721c40bf6687cd12d9361759e6b234/Data/ParlaMint-PT/ParlaMint-PT_2015-01-07.xml#L13-L16missing component file classification
parla.meeting / parla.sitting
https://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L2
should be sth like:
attribute scheme should refer to a taxonomy
@scheme
contenthttps://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT.xml#L131
should be
missing funder in component file
use values from root file (CLARIN-ERIC)
wrong settingdate
@when
does not match the text content of nodehttps://github.com/IULATERM-TRL-UPF/ParlaMint/blob/764ee33cf7eb3fdbeecfbc914db7327be7ccf7ad/Data/ParlaMint-ES-CT/ParlaMint-ES-CT_2018-01-17-0101.xml#L89
missing chair speaker
opposition relation
should be
split forename and surname