clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

DK: missing Term attribute in meta.tsv #711

Closed boevkoski closed 11 months ago

boevkoski commented 1 year ago

Missing Term attribute in meta.tsvs.

image

Denmark has had four terms since 2014 (the first ParlaMint.DK entry):

2011 - 2015 - https://en.wikipedia.org/wiki/2011_Danish_general_election 2015 - 2019 - https://en.wikipedia.org/wiki/2015_Danish_general_election 2019 - 2022 - https://en.wikipedia.org/wiki/2019_Danish_general_election November 2022 (current, not in ParlaMint) - https://en.wikipedia.org/wiki/2022_Danish_general_election

TomazErjavec commented 1 year ago

This one is for @BartJongejan, but he never accepted the invite so I can't assign the issue to him; re-issued invite. The problem is that DK:

Both these things are needed for components to be assigned to a term in TSVs and also in the concordancers. Surprisingly nobody noticed this before, thanks @boevkoski! Should be fixed for 3.1.

BartJongejan commented 1 year ago

Dorte Haltrup Hansen asks:

Regarding the missing attribute "term": Could you please clarify if we understand it correct.

Should we change the following in the "listOrg" part of the corpus root TEI header:

Cabinets

to

Legislative period

And further more insert the following (exemplified) in all files:

Regeringen Helle Thorning-Schmidt II

Please see the attached test files issue711.zip

TomazErjavec commented 1 year ago

The attached files look good to me. You can also have a look at the other corpora, e.g. GB: https://github.com/clarin-eric/ParlaMint/blob/0dfbeb729f258a114e29b82faca9e847ed4c51b6/Samples/ParlaMint-GB/ParlaMint-GB-listOrg.xml#L22-L35 and https://github.com/clarin-eric/ParlaMint/blob/0dfbeb729f258a114e29b82faca9e847ed4c51b6/Samples/ParlaMint-GB/ParlaMint-GB_2017-09-07-commons.xml#L13-L15

(but nothing wrong with having text inside the <meeting> element as you have in your example) Of course it's - as always - a good idea to validate the files.

BartJongejan commented 1 year ago

Hi, We have now corrected the listOrg and an example file to include "term". As far as we can see, it validates. It differs a little from eg. the British data, since we do not refer to different "parliaments" in Denmark, only to different governments and different parliamentary years. Before changing all data, please confirm that it is ok. New-_files.zip

TomazErjavec commented 1 year ago

@BartJongejan thanks for the examples:

TomazErjavec commented 11 months ago

This has indeed be corrected both in the TEI source and derived formats, e.g. https://www.clarin.si/ske-beta/#text-type-analysis?corpname=parlamint40_dk&wlminfreq=1&wlicase=1&include_nonwords=1&showresults=1&wlnums=frq&wlattr=speech.term

So, thanks for your work & closing.