Closed boevkoski closed 11 months ago
This one is for @BartJongejan, but he never accepted the invite so I can't assign the issue to him; re-issued invite. The problem is that DK:
Both these things are needed for components to be assigned to a term in TSVs and also in the concordancers. Surprisingly nobody noticed this before, thanks @boevkoski! Should be fixed for 3.1.
Dorte Haltrup Hansen asks:
Regarding the missing attribute "term": Could you please clarify if we understand it correct.
Should we change the following in the "listOrg" part of the corpus root TEI header:
Cabinetsto
Legislative periodAnd further more insert the following (exemplified) in all files:
Please see the attached test files issue711.zip
The attached files look good to me. You can also have a look at the other corpora, e.g. GB: https://github.com/clarin-eric/ParlaMint/blob/0dfbeb729f258a114e29b82faca9e847ed4c51b6/Samples/ParlaMint-GB/ParlaMint-GB-listOrg.xml#L22-L35 and https://github.com/clarin-eric/ParlaMint/blob/0dfbeb729f258a114e29b82faca9e847ed4c51b6/Samples/ParlaMint-GB/ParlaMint-GB_2017-09-07-commons.xml#L13-L15
(but nothing wrong with having text inside the <meeting>
element as you have in your example)
Of course it's - as always - a good idea to validate the files.
Hi, We have now corrected the listOrg and an example file to include "term". As far as we can see, it validates. It differs a little from eg. the British data, since we do not refer to different "parliaments" in Denmark, only to different governments and different parliamentary years. Before changing all data, please confirm that it is ok. New-_files.zip
@BartJongejan thanks for the examples:
<meeting corresp="#FT" ana="#parla.term #FT.14">Regeringen Helle Thorning-Schmidt II</meeting>
<meeting n="20141" ana="#parla.session">20141</meeting>
<meeting n="1" ana="#parla.meeting">M1</meeting>
For the concordancer, I am taking the value of @n
as the label for the term/session/meeting as it seems a convenient and short langauge-independnet label. So, could you add this attribute also to term? I'm not quite sure what it can be, because the other meeting/@n
are numeric, but this one isn't. The attribute value can here be any string, still, maybe nice to leave it language independent. How about the range of years? e.g.
<meeting corresp="#FT" ana="#parla.term #FT.14" n="2014-2015">Regeringen Helle Thorning-Schmidt II</meeting>
If you agree, pls. annotate the corpus like this. But I would close the issue once I have in fact processed your corpus and checked that all is ok with the terms.
This has indeed be corrected both in the TEI source and derived formats, e.g. https://www.clarin.si/ske-beta/#text-type-analysis?corpname=parlamint40_dk&wlminfreq=1&wlicase=1&include_nonwords=1&showresults=1&wlnums=frq&wlattr=speech.term
So, thanks for your work & closing.
Missing Term attribute in meta.tsvs.
Denmark has had four terms since 2014 (the first ParlaMint.DK entry):
2011 - 2015 - https://en.wikipedia.org/wiki/2011_Danish_general_election 2015 - 2019 - https://en.wikipedia.org/wiki/2015_Danish_general_election 2019 - 2022 - https://en.wikipedia.org/wiki/2019_Danish_general_election November 2022 (current, not in ParlaMint) - https://en.wikipedia.org/wiki/2022_Danish_general_election