everypolitician / everypolitician-data

data for national legislatures worldwide
http://everypolitician.org/
237 stars 54 forks source link

South Korea: 20th Assembly #14151

Closed davewhiteland closed 8 years ago

davewhiteland commented 8 years ago

Source of New Data

Legislature

Assembly (20th term)

Source

Team POPONG's spreadie: https://github.com/teampopong/data-assembly/blob/master/assembly.csv yay!

Type of Data

List of Members

Notes

Via email from Hoony

tmtmtmtm commented 8 years ago

That's the same source as we were already using for the 19th Term: https://github.com/tmtmtmtm/korea-popong-data/blob/0574fd43c510b26814a10cc392e2125eb0fa9f02/scraper.rb#L40

Does this file simply replace the previous one in its entirety?

wfdd commented 8 years ago

[Just an observation regarding the Chinese name mapping in https://github.com/tmtmtmtm/korea-popong-data/blob/0574fd43c510b26814a10cc392e2125eb0fa9f02/scraper.rb#L55 - 'cn' isn't a valid language code. If you were thinking of Chinese, that's zh, but Hanja isn't so much Chinese but Korean in Chinese writing; the most appropriate lang value here would be ko-Hant, where ko is the language, Korean, and Hant is the writing script, traditional Chinese. I'll crawl back into my hole now...]

tmtmtmtm commented 8 years ago

we're just copying what in the source

wfdd commented 8 years ago

The column suffix is copied into the lang field of the names array, which one might reasonably expect to abide by the IETF standard. I don't imagine the authors of the CSV intended for 'cn' to be used in this way. You did similarly change 'kr' to 'ko'.

davewhiteland commented 8 years ago

OK, so the POPONG CSV contains data from multiple terms: we need to split conditionally on the when_elected field. That field contains an entry that includes the terms listed in parentheses: so for 19th Assembly should contain "19대", 20th Assembly "20대"

Sample entry for someone who has been elected to both 19th and 20th Assemblies (but no others prior): has this entry: 재선( 19대 , 20대 )

Currently we are not considering that, so we are inadvertently showing a combination of 19th and 20th Assembly politicians in the 19th Assembly's Popolo.

Furthermore: Wikipedia list for 20th Assembly: https://ko.wikipedia.org/wiki/%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD_%EC%A0%9C20%EB%8C%80_%EA%B5%AD%ED%9A%8C%EC%9D%98%EC%9B%90_%EB%AA%A9%EB%A1%9D

Note this also means that there's historical data in that source for terms before the 19th too.

wfdd commented 8 years ago

It doesn't actually contain data from multiple terms, it simply lists prior terms of current MPs. For the previous term you need to use the last version of the CSV before the 2016 election, which is this one (or whichever earlier version has 300 entries for the full list of MPs of the 19th parliament).

davewhiteland commented 8 years ago

@wfdd yes thanks!

To clarify: the CSV source is effectively current; it contains a when_selected field but now we're in the 20th Assembly, everyone in the file is a member of the 20th (i.e. 20대 is ubiquitous). For historic data, that is, previous terms, need to go back in git history.

We know from experience it's a little more subtle than this for politicians who may have served partial terms, but that's the gist of it, I think.

wfdd commented 8 years ago

According the 2008 edition of the Chronicle of Parliamentary Elections:

Vacancies of district constituency seats arising between general elections are filled through by-elections, on condition that there remains at least one year in the term of the Assembly member to be replaced. Vacancies of proportional representation seats are filled by the "next-in-line" candidates of the political party concerned.

It might help with figuring things out if you're gonna dig through the git history.