edwardsamuel / Wilayah-Administratif-Indonesia

Data Provinsi, Kota/Kabupaten, Kecamatan, dan Kelurahan/Desa di Indonesia
MIT License
940 stars 768 forks source link

Fix villages data #11

Open prasastoadi opened 8 years ago

prasastoadi commented 8 years ago

Fix '0' to 'O'

ALUE DUA MUKA 0 -> ALUE DUA MUKA O SITIMERT0 -> SITIMERTO

jayvdb commented 8 years ago

This data file is created by a script extracting data from http://mfdonline.bps.go.id/ . See https://github.com/edwardsamuel/Wilayah-Administratif-Indonesia/blob/master/scripts/run.sh#L12

It is not useful to modify this generated file. Your changes will be overwritten when the script runs next time.

Is the BPS data wrong? If it is wrong, it needs to be fixed in the BPS source.

You can see "ALUE DUA MUKA 0" and "SITIMERT0" are used in https://web.archive.org/web/20150207100538/http://www.bps.go.id/eng/download_file/Population_of_Indonesia_by_Village_2010.pdf

Other occasions where this data has appeared;

https://www.google.com/search?q=%22SITIMERT0%22+%223506190010%22

And a 'bot' created Wikipedia articles: https://nl.wikipedia.org/wiki/Alue_Dua_Muka_0 https://nl.wikipedia.org/wiki/Sitimert0

And it appears in a wordlist here: https://id.wiktionary.org/wiki/Wiktionary:ProyekWiki_bahasa_Indonesia/Daftar_kata/Nama/Tempat/Semua

jayvdb commented 8 years ago

If we can confirm that the BPS data is wrong, one solution is for this repository to have a 'fixes' list, which run.sh uses to fix the generated csv files.

edwardsamuel commented 8 years ago

Hi @prasastoadi,

Agree with @jayvdb. Any generated files can't be edited manually. It will be overwritten in the next run. You need to modify the script that generates the files, in this project can be run.sh or the python script. But, you need to make sure first if the source (BPS MDF Online) data is wrong.

prasastoadi commented 8 years ago

I am very confident that the two villages name are wrong. We know that 0 (zero) is not alphabet.

Here is the Sitimerto village https://goo.gl/maps/qMH3K7LjahB2

Alue Dua Muka O http://lmgtfy.com/?q=alue+dua+muka+o+site%3Ago.id

I propose very simple method before write the data to csv. I think better to check villages/districts/regencies/provinces one by one to prevent typo in the data. I hope someone do it in the next patch 😉

feryardiant commented 7 years ago

IMO it's pointless to update anything in this repo while the source data from BPS still remain wrong.

Dear @prasastoadi, one thing that you should do is ask BPS to update their data instead.

jayvdb commented 6 years ago

Maybe fixes should be wrapped in a separate function call (and possibly separate data file), so that users can easily apply all fixes on top of the existing data.