everypolitician / everypolitician-names

Generate names.csv (all the politicians' names from everypolitician.org's data)
https://everypolitician.github.io/everypolitician-names/names.csv
8 stars 2 forks source link

add names-current-term.csv #3

Open davewhiteland opened 8 years ago

davewhiteland commented 8 years ago

The big list of all the names of all politicians ever is impressive, but it's possible that a similar file that only contains politicians from all legislatures' current terms would be more useful for some more timely applications.

There's no specific use case for this, just a speculative idea that it might make the data more useful (that big file is cumbersome). Note that for specific legislatures, EveryPolitician already publishes a handy names.csv file (URL included in the countries.json index file).

It seems likely that the overhead to creating this as part of the creation of the megainclusive names.csv might be relatively low, so worth trying.

tmtmtmtm commented 8 years ago

Should this be all people in the current term? Or just all current members?

struan commented 8 years ago

My hunch is that a naive user will expect the latter.

davewhiteland commented 8 years ago

It's hard to know without a use case.

My feeling is it's always going to be a little fuzzy because of latency between real life and data anyway, so just current term is a start. shrugs

tmtmtmtm commented 8 years ago

Well, we could of course create both versions. But that seems more likely to just be confusing.

Perhaps @pudo or @jpmckinney might have an opinion on this?

pudo commented 8 years ago

Ok, first off: this sounds like a tremendously useful feature for EveryPolitician, I'd really like to use it. For my use case -- which is finding mentions of politicians in documents and databases -- I'd actually prefer the term members over the current members. That gives me a bit of extra coverage. Perhaps the former member had to resign over a scandal -- in that case I want to track them a bit longer :)

davewhiteland commented 8 years ago

Further to my throwaway comment:

"It seems likely that the overhead to creating this as part of the creation of the megainclusive names.csv might be relatively low, so worth trying."

...I now notice that in fact everypolitician-names really is just doing very little beyond concatenating the existing names.csvs with some extra columns; which is to say currently it's wholly unaware of what term any name is from, and really is just shuffling CSV lines around. Heh.

jpmckinney commented 8 years ago

If you wanted to perform analysis over only current members (e.g. gender analysis), then you'd need the current members version. In lots of journalism use cases, however, the current term version is more relevant, as it matters if a member doesn't make it to end of term. So, yeah, I think both are useful.

davewhiteland commented 8 years ago

The corollary to that is perhaps that we've already made this decision insomuch as the names.csv in each legislature's directory within everypolitician/everypolitician-data is created with no regard for terms... yet. But maybe there's a case for doing it there rather than in this repo; then this repo's remit would be just to collate the current-term and current-term-currently-in-office CSV files into "global" ones. Which is what it's doing with the names.csv files already, i.e., collating files from EveryPolitician into a single global one. Perhaps.