clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
50 stars 53 forks source link

Parliamentary body - unexpected values #705

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

possibly related to #584

image

matyaskopp commented 1 year ago

Another view on the same issue: https://www.clarin.si/ske-beta/#text-type-analysis?corpname=parlamint30_xx_en&tab=basic&filter=containing&onecolumn=1&wlattr=speech.body&wlminfreq=1&include_nonwords=1&itemsPerPage=50&showresults=1&cols=%5B%22frq%22%5D&wlsort=frq

image

TomazErjavec commented 1 year ago

The text.body attrbute is a multivalued one, so "National parliament|Lower house|Legislative session|Sitting" then also decomposes into its consituent values. This comes from the 3.0 FR corpus, who has references to these categories in the meeting element.

The value of this attribute is now (9129867) explicitly limited to one of the values we expect. Unfortunatelly the legislature taxonomy doesn't really allow for a more sensible approach.

The "unicameral" comes from GR 3.0: https://github.com/clarin-eric/ParlaMint/blob/535dae3f802d20ea053e76899ddcf6ab805049c0/Data/ParlaMint-GR/ParlaMint-taxonomy-parla.legislature.xml#L46

The term should of course be "Unicameralism". This issue will crop up again once we make the common taxonomies, and we can address it then. For now, I fixed it in my local copy.

TomazErjavec commented 1 year ago

Local copies fixed, issues to indicate problems posted, oridinal corpora re-compiled, en recompiled, verts re-joined and re-installed. Phew! But I think it is ok now: image