clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
43 stars 53 forks source link

Bug in V1.0 BG corpus (and in schema): bad affiliation/@role values #8

Closed TomazErjavec closed 3 years ago

TomazErjavec commented 3 years ago

In the ParlaMint-teiCorpus schema there are a lot of values of affiliation/@role which are used only for BG. This is ok, but:

matyaskopp commented 3 years ago

Should be all roles changed to camel-case? Is underscore OK? https://github.com/clarin-eric/ParlaMint/blob/main/ParlaMint-SI/ParlaMint-SI.ana.xml#L481

Currently, in CZ corpus we have these roles for affiliation:

alternate_of_delegation
candidate
chairman
chairperson
head_of_delegation
member
observer
pm
president_(speaker)
verifier
verifier_of_commission
verifier_of_committee
vice-chairman
vice-chairperson
vicepresident

and roles for organizations:

board_of_directors
chamber_of_the_nations
chamber_of_the_people
commission
committee
czech_national_council
delegation
european_parliament
government
institution
international_organizations
interparliamentary_friendship_group
parliament
party
political_group
president
senate
subcommittee
supervisory_board
working_group
TomazErjavec commented 3 years ago

Should be all roles changed to camel-case?

I think it would be nice if we were consistent in the naming scheme at least for V2 of the corpora, so, yes.

Is underscore OK? https://github.com/clarin-eric/ParlaMint/blob/main/ParlaMint-SI/ParlaMint-SI.ana.xml#L481

Good point, so this needs to be corrected too. Will open a new issue.

Currently, in CZ corpus we have these roles for affiliation.

If you could change them all into camel case (both the _ and the - ones), that would be great. In any case, I will need to add them to the schema, so pls. let me know your final affiliation/@role values (and org/@role if you will have any new ones).

matyaskopp commented 3 years ago

I have unified person/@role values:

chairman     ~ chairperson
minister     ~ minister of ...
viceChairman ~ viceChairperson 

so our final list of all affiliation roles is (I believe):

member
viceChairman
MP
candidate
chairman
verifier
minister
replacement
vicePresident
headOfDelegation
president
presidiumMember
observer
vicePublicDefenderOfRights
publicDefenderOfRights
alternateOfDelegation

list of new (or camelCased existing ones - with star) roles with number of occurrences in corpus:

*   662 viceChairman
    604 candidate
    332 verifier
     62 replacement
*    22 vicePresident
     18 headOfDelegation
      2 presidiumMember
      2 observer
      1 vicePublicDefenderOfRights
      1 publicDefenderOfRights
      1 alternateOfDelegation

I don't insist on these exact words - I can replace them (ie replacement ~ substitute), or just remove.

TomazErjavec commented 3 years ago

In c99c006 I added the values used by CZ along with comments into ParlaMint/Schema/ParlaMint-teiCorpus.rng (i.e. https://github.com/clarin-eric/ParlaMint/blob/main/Schema/ParlaMint-teiCorpus.rng). Mora about this for CZ in #16.

TomazErjavec commented 3 years ago

Fixed in 393e208, closing.