clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
50 stars 53 forks source link

SI feedback #579

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

different responsibilities and funders in root and component files

root TEI:

            <respStmt>
               <persName ref="https://orcid.org/0000-0001-6143-6877 http://viaf.org/viaf/305936424">Andrej Pančur</persName>
               <persName ref="https://orcid.org/0000-0002-1560-4099 http://viaf.org/viaf/15145066459666591823">Tomaž Erjavec</persName>
               <persName ref="https://orcid.org/0000-0002-0464-9240">Katja Meden</persName>
               <resp xml:lang="sl">Kodiranje TEI</resp>
               <resp xml:lang="en">TEI corpus encoding</resp>
            </respStmt>
            <respStmt>
               <persName ref="https://orcid.org/0000-0001-6143-6877 http://viaf.org/viaf/305936424">Andrej Pančur</persName>
               <persName ref="http://viaf.org/viaf/86154440112735340300">Mihael Ojsteršek</persName>
               <resp xml:lang="sl">Urejanje seznama govornikov</resp>
               <resp xml:lang="en">Editing a list of speakers</resp>
            </respStmt>
            <funder>
               <orgName xml:lang="sl">Raziskovalna infrastruktura CLARIN</orgName>
               <orgName xml:lang="en">The CLARIN research infrastructure</orgName>
            </funder>
            <funder>
               <orgName xml:lang="sl">Slovenska raziskovalna infrastruktura CLARIN.SI</orgName>
               <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName>
            </funder>
            <funder>
               <orgName xml:lang="sl">Raziskovalni program ARRS P6-0411 "Jezikovni viri in tehnologije za slovenski jezik"</orgName>
               <orgName xml:lang="en">Slovenian Research Agency Programme P6-0411 "Language Resources and Technologies for Slovene"</orgName>
            </funder>

component TEI:

            <respStmt>
               <persName>Andrej Pančur</persName>
               <resp xml:lang="sl">Kodiranje TEI</resp>
               <resp xml:lang="en">TEI corpus encoding</resp>
            </respStmt>
            <funder>
               <orgName xml:lang="sl">Raziskovalna infrastruktura CLARIN</orgName>
               <orgName xml:lang="en">The CLARIN research infrastructure</orgName>
            </funder>

weird revisions

There are revisions before the document publishing date (2022-04-06): https://github.com/katjameden/ParlaMint/blob/7620efbc7ab76adb8f64bf4c84558a8770bfa798/Data/ParlaMint-SI/ParlaMint-SI_2022-04-06-SDZ8-Izredna-99.xml#L95-L102

      <revisionDesc>
         <change when="2021-06-11">
            <name>Tomaž Erjavec</name>: Made sample.</change>
         <change when="2021-03-20">
            <name>Tomaž Erjavec</name>: Fixes for Version 2.</change>
         <change when="2020-10-06">
            <name>Tomaž Erjavec</name>: Small fixes for ParlaMint.</change>
      </revisionDesc>

missing source time notes

I dont know if the recordings are available, if so then these timenotes can be useful for alignment audio with text.

source: https://www.dz-rs.si/wps/portal/Home/seje/evidenca?mandat=VIII&type=mag&uid=8F73E726D055BA92C125881C00289D29 image

TEI: https://github.com/katjameden/ParlaMint/blob/7620efbc7ab76adb8f64bf4c84558a8770bfa798/Data/ParlaMint-SI/ParlaMint-SI_2022-04-06-SDZ8-Izredna-99.xml#L130

               <seg xml:id="ParlaMint-SI_2022-04-06-SDZ8-Izredna-99.seg11">Ker <!-- ... -->  izboljšave, ki jih 
<!-- 2. TRAK: (VP) 9.05 -->
prinaša ta predlog zakona. <!-- ... --> </seg>

opposition relation

We encode opposition as active and government as passive https://github.com/katjameden/ParlaMint/blob/7620efbc7ab76adb8f64bf4c84558a8770bfa798/Data/ParlaMint-SI/ParlaMint-SI-listOrg.xml#L400-L404

      <relation name="opposition"
                mutual="#party.LMŠ #party.SD #party.SAB #party.Levica.2 #party.NeP #party.NP #party.IMNS #party.SNS"
                from="2020-03-13"
                to="2022-06-01"
                ana="#GOV.14"/>

should be

      <relation name="opposition"
                active="#party.LMŠ #party.SD #party.SAB #party.Levica.2 #party.NeP #party.NP #party.IMNS #party.SNS"
                passive="#GOV"
                from="2020-03-13"
                to="2022-06-01"
                ana="#GOV.14"/>

renaming relation

use active and passive https://github.com/katjameden/ParlaMint/blob/data/Data/ParlaMint-SI/ParlaMint-SI-listOrg.xml#L369

      <relation name="renaming" mutual="#party.SMC.2 #party.GAS" when="2021-12-04"/>
katjameden commented 1 year ago

I have implemented all the checks mentioned in the feedback, with one exception (missing source time notes) for two reasons: First, we are not sure if the time notes are really associated with the recordings or if they are there to note other events. However, the main reason is the fact that in order to include these time notes in the corpus, we would have to scrap the data again.

TomazErjavec commented 1 year ago

Not sure if this is the right place for this, but why does SI (and SE) fail in merge with main?

matyaskopp commented 1 year ago

Not sure if this is the right place for this, but why does SI (and SE) fail in merge with main?

I have fixed the validation script - the previous version did not validate a root file title at all...

root file of TEI.ana version should contain ParlaMint.ana https://github.com/clarin-eric/ParlaMint/blob/83e12c2c70649540f759c8986f8b1b5409a15946/Data/ParlaMint-SI/ParlaMint-SI.ana.xml#L8-L9

TomazErjavec commented 1 year ago

the previous version did not validate a root file title at all...

I see! OK, for the complete version I fix this anyway (as almost everybody has it wrong). For the GitHub sample, I'd then ask @katjameden to fix. Don't know what we should do for SE, maybe just do it ourselves?

matyaskopp commented 1 year ago

I see! OK, for the complete version I fix this anyway (as almost everybody has it wrong). For the GitHub sample, I'd then ask @katjameden to fix. Don't know what we should do for SE, maybe just do it ourselves?

I think we can fix it ourselves by uploading the content of .../Sample-ParlaMint-## folder. I don't mind this error - this will be fixed when we release samples v3.0 on GitHub...

katjameden commented 1 year ago

I think we can fix it ourselves by uploading the content of .../Sample-ParlaMint-## folder. I don't mind this error - this will be fixed when we release samples v3.0 on GitHub...

I have fixed this on my end (i.e. fork), please let me know what you decide (should I open new PR to add this change, or should it be fixed later). I also did a local validation (validate-parlamint- SI) and the error was resolved.

matyaskopp commented 1 year ago

@katjameden, ok, then please open a pull request