clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

New definitions for subcorpora #629

Closed TomazErjavec closed 1 year ago

TomazErjavec commented 1 year ago

As the UA corpus will also be part of ParlaMint 3.0, there were suggestions that we should introduce a new subcorpus corresponding to the war in Ukraine. While we are changing the scope of subcorpora, this also seems a good opportunity to change the start date of the COVID subcorpus to something more sensible than the current rather arbitrary date of 2019-11-01.

With this, the following suggestions:

I would be interested in comments to the above plan. If none, or if all in favour, I would try to introduce this already for 3.0. This could be done with the finalisaiotn script, while the GitHub samples would also need to be changed (or just wait for samples derived from 3.0).

osenova commented 1 year ago

Hi Tomaz, I totally agree with the three subcorpora suggestion: Reference, COVID and WAR. I am also OK with the dates. Just please check this also with Darja.

AnnaParla commented 1 year ago

I believe we need to think about naming the "war" corpus component a bit more carefully, since Russia invaded and annexed parts of Ukraine back in 2014. It did the same in 2022 but on a much larger scale. And war related themes were introduced into the Ukrainian corpus back in 2014. In fact, it might be interesting to compare the exact wordings and frequencies of mentions of Russia's aggression against Ukraine across the national parliaments... As for the sub-corpus name, maybe "large-scale war" or something along these lines will be better, if we need to refer to the period from 2022 onward? To indicate that the war started in 2022 would be inaccurate and it could create unnecessary confusions (and potentially even manipulations).

TomazErjavec commented 1 year ago

I am not against renaming (although I think 2014 is more like "invasion", not really "war", which 2022 surely is) but I would need a single-word term. As for confusion / manipulation, I think this potentially holds for any distinction, e.g. "COVID" subcorpus could have any number of start points, from the first reported instance of the virus, to first known case in Europe, to first mention in a parliament etc etc. So, I wouldn't worry about this, as long as we are clear what the reasoning is.

AnnaParla commented 1 year ago

Upon looking into this question a bit more, it seems that "military aggression" is an umbrella term covering the whole period between 2014 and 2022, which both EU officials (https://finance.ec.europa.eu/eu-and-world/sanctions-restrictive-measures/sanctions-adopted-following-russias-military-aggression-against-ukraine_en) and Ukraine's Foreign Ministry agree upon now.

Back in 2014, the EU called it "Russia's actions destabilizing the situation in Ukraine" but after 24 Feb 2022 it has been worded as "Russia's war of aggression against Ukraine". Our diplomacy has been insisting on the single nature and continual application of this aggression since 2014, including the "hot component" as well as the annexation of Ukrainian territories, mass migration of the population from the affected areas, information war, cyber war, and war crimes. The only significant difference is the scale.

TomazErjavec commented 1 year ago

Thanks for the explanation, so, if I understand correctly "War" would then still be the appropriate name for the subcorpus starting in 2022-02-24.

AnnaParla commented 1 year ago

Yes, but it might need to be explained. What about placing an information icon right next to War with a short explanation like "starting from Russia's full-scale invasion of Ukraine" or something similar?

AnnaParla commented 1 year ago

On a different note, 2020-01-31 is a landmark not only for COVID but also for Brexit, which will be convenient for searches in the ParlaMint-GB!

TomazErjavec commented 1 year ago

Yes, but it might need to be explained. What about placing an information icon right next to War with a short explanation like "starting from Russia's full-scale invasion of Ukraine" or something similar?

I can place this info in the taxonomy, yes. Can't do an icon in the concordancer I think, but will have a look if it is possible to give glosses with values.

As for Brexit, well spotted! :)

TomazErjavec commented 1 year ago

I can place this info in the taxonomy

Did it now, in fdb8dce. @matyaskopp, can you merge devel into main, so that it is available there too pls.? I will modify the finalization step to correctly label the components, and (somehow) insert the new taxonomy into the corpora.

TomazErjavec commented 1 year ago

This has now been implemented (https://github.com/clarin-eric/ParlaMint/blob/main/Data/Taxonomies/ParlaMint-taxonomy-subcorpus.xml) also so that corpora that are finalized now include this taxonomy and that the derived files make the distinction. Am just re-processing all the corpora that have been submitted so far.