covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Add populations to Wikidata #417

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/775, transferred here on Monday Apr 13, 2020 at 00:45 GMT


Is there anyone who has the right to edit Wikidata articles for counties? Basically it means 50+ edits on Wikidata which means the account is "autoconfirmed".

Right now I have that level, but it's quite tedious to fix all populations alone and I'd be happy if someone could help me.

Some of the locations are less important and everyone can edit them, like these ones in Panama: https://www.wikidata.org/wiki/Q217138

Other ones are in the "top 3000" items and only people with confirmed accounts can edit them. But basically editing the less important features would allow someone to get to this autoconfirmed level.

So who would like to help by entering population informations?

Need to add missing in:

jzohrab commented 4 years ago

(Transferred comment)

Populations on Wikipedia: https://en.wikipedia.org/wiki/Provinces_of_Panama#Provinces

Alternative source from @ciscorucinski which has it at county level: https://www.citypopulation.de/en/panama/admin/

jzohrab commented 4 years ago

(Transferred comment)

Yes, sometimes I'm using Wikipedia as a source, but still it needs to be added manually. Citypopulation.de doesn't allow downloading the map data. Also we prefer official data with matching government dataset.

jzohrab commented 4 years ago

(Transferred comment)

Here's population information down to corregimientos level (the granularity at which we get COVID data): https://github.com/EricLuceroGonzalez/Panama-Political-Division Population is presumably from the 2010 census, but it would have to be verified.

Which raises the question: how are we validating any of this?

jzohrab commented 4 years ago

(Transferred comment)

It seems like an intimidating interface with wikidata for adding information

jzohrab commented 4 years ago

(Transferred comment)

@ciscorucinski you mean how to add population data?

Basically:

  1. click "add statement" at the bottom
  2. select populations
  3. enter the value
  4. add qualifier
  5. select point in time
  6. enter year
  7. add reference
  8. select URL or type P4656 for wikipedia import
  9. paste URL
  10. save

If you do this in multiple steps, it's quite easy to get over the 50 required edits to get your account autoconfirmed. For example add - publish - add qualifier - publish - add reference - publish can get you 4 edits. So with 13 regions you are over 50 edits :-)

jzohrab commented 4 years ago

(Transferred comment)

Done for Panama's provinces.

jzohrab commented 4 years ago

(Transferred comment)

There are ways of doing this via Google Sheets and a tool called QuickStatements. Since we are only concerned with one type of data import process, we should be able to create a fairly standardized process within a spreadsheet.

Google Sheets + QuickStatements: https://www.youtube.com/watch?v=bUpJN4IklJ8 OpenRefine: https://www.youtube.com/watch?v=wfS1qTKFQoI

jzohrab commented 4 years ago

(Transferred comment)

@ciscorucinski if you can mass import using this tool it'd be great! So far I've done all my edits by hand.

jzohrab commented 4 years ago

(Transferred comment)

@hyperknot you can! But I am uncertain how to go about doing it for this data right now

jzohrab commented 4 years ago

(Transferred comment)

Luckily we don't have that many missing populations. If we encounter an other country with a lot, I'll comment here.

jzohrab commented 4 years ago

(Transferred comment)

is there an easy way to find what is missing?

jzohrab commented 4 years ago

(Transferred comment)

Ones without population in this JSON: https://raw.githubusercontent.com/hyperknot/country-levels-export/master/iso2.json

jzohrab commented 4 years ago

(Transferred comment)

Portugal seems like a good candidate: https://github.com/hyperknot/country-levels-export/blob/master/docs/iso2_list/PT.md

jzohrab commented 4 years ago

(Transferred comment)

We need to add: Slovenia, Ireland, Poland, and Lithuania.

jzohrab commented 4 years ago

(Transferred comment)

I fixed Ireland and Poland. What is missing in Lithuania?

For Slovenia, it really needs that batch updating effort! @ciscorucinski can you help with that?

jzohrab commented 4 years ago

(Transferred comment)

Let's create a Google Sheet, and try out a few records before mass editing. I have never edited a wikidata entry, so consider me a noob here 😅

What info is needed to identify a population point in terms of wikidata? We need Q IDs for a few datapoints, but these can be retrieved through a wikidata Chrome extension in Google Sheets.

Just datapoint names such as Country, State, and county level names should be good enough I guess??? Along with the population data and url reference

jzohrab commented 4 years ago

(Transferred comment)

All the Q-s we need are here: https://github.com/hyperknot/country-levels-export/blob/master/docs/iso2_list/SI.md

Machine readable format is this: https://raw.githubusercontent.com/hyperknot/country-levels-export/master/iso2.json

The other side of the equation should be some government census CSV listing those populations in a CSV.

jzohrab commented 4 years ago

(Transferred comment)

Really not ideal (has some weird character errors) but here is a CSV from the Slovenian Statistical Bureau. Data is from 2019. https://gist.github.com/qgolsteyn/145d82f984d65c34e778371a69cf5433

jzohrab commented 4 years ago

(Transferred comment)

@qgolsteyn thanks! Do you have the source for this file? Maybe chardetect would tell us what encoding it's in.

jzohrab commented 4 years ago

(Transferred comment)

I don't have it immediately, but will get the source to you by this evening. I also update the list with additional countries that need population info

jzohrab commented 4 years ago

(Transferred comment)

Thanks!

jzohrab commented 4 years ago

(Transferred comment)

My appologies, here is Slovenia's data: https://pxweb.stat.si/SiStatDb/pxweb/en/10_Dem_soc/10_Dem_soc__05_prebivalstvo__10_stevilo_preb__20_05C40_prebivalstvo_obcine/05C4002S.px/table/tableViewLayout2/

jzohrab commented 4 years ago

(Transferred comment)

Portugal is done, as is Colombia. Working on Slovenia next.

jzohrab commented 4 years ago

(Transferred comment)

I think Slovenia is done, but I got "errors" on their tool, despite there being hundreds of successful edits....

EDIT

Because I tried to add the atomic number of a municipality among other atrocities 😆 Anyway, it's processing now, should be done soon.

jzohrab commented 4 years ago

(Transferred comment)

Lithuania should be done...after much struggle. I'm off for the rest of the night.

jzohrab commented 4 years ago

(Transferred comment)

@shaperilio thanks so much, I've updated the file already but I'll make a new processing for Lithuania as well.

jzohrab commented 4 years ago

(Transferred comment)

Korea should be up to date now

jzohrab commented 4 years ago

(Transferred comment)

@hyperknot , is this issue still open? Wondering what the current status is. Cheers, z