Closed cliftonmcintosh closed 6 years ago
I will work on this tonight (CET) but I'd like to know what do you exactly want to be part of this issue? The SQL script to upload it into the DB or something more?
@FerranMarin
There are several steps to adding a data set. We would welcome a contribution that accomplishes any or all of that work. First steps would be:
It is possible to do these steps as one.
The format for the data would look something like:
geo_code, geo_level, religion, sex, total
01,district,Hindu,male,21957
01,district,Hindu,female,23801
01,district,Buddhist,male,12308
01,district,Buddhist,female,13846
01,district,Islam,male,1
01,district,Islam,female,1
01,district,Kirati,male,24064
01,district,Kirati,female,27764
...
The mappings from district name to geo_code
can be found here. ("01" is the code for Taplejung.) The maptools project also has samples of scripts that have been used to transform data.
Please note that we also need to sum up all the district entries by sex to get national totals for each religion by sex.
Because the religion data is in a PDF, converting it to a CSV in the right format will likely take some effort. If you can accomplish that, we would welcome a pull request that simply includes the data in csv format.
We would of course welcome your taking further steps as well. Beyond data conversion here are the steps we take to get the data into www.nepalmap.org
Awesome! Thanks for the detailed guide.
Sorry for the radio silence, been away on holidays and was hard to go online. I'll start working on this :)
Hello @cliftonmcintosh ,
Sorry for the slow pace, I finally have a csv and an sql file ready to upload. But I can only find where to place the SQL file, where should I upload the csv to?
if the SQL file has the data, that is sufficient. If you would like to submit the CSV, we have a data project. We also have a "maptools" project if you would like to submit a PR that contains the script you used to create the CSV.
Do you have a link to the data project so I upload the csv? I'll be uploading the sql in a moment :)
@FerranMarin
I believe there is a duplicate row in the data right now. When I ran the sql file, I got this error:
ERROR: could not create unique index "religion_pkey"
DETAIL: Key (geo_level, geo_code, religion, sex)=(district, 41, Hindu, Male) is duplicated.
It looks like district 41 appears twice. See beginning here: https://github.com/Code4Nepal/nepalmap_app/blob/dev/sql/religion.sql#L875
It could be that the data is in the original twice or that there was an error and one of the other districts was assigned the code "41."
Will double check and correct it. Gimme a few minutes :) EDIT: Turns out the full district 41 is duplicated, I believe it was when I translated from district name to geo_code.
@FerranMarin
I did a quick check by taking just the district code column and doing
sort religiondistricts.csv | uniq --count --repeated | grep -v 22
It looks like there are three districts that may be off. These three have duplicates:
District 07 District 27 District 41
There are also some missing districts. These may be where those duplicates belong: District 20 District 24 District 38
I'm not sure if this is a problem with the original data set or with the transformation.
I will be checking the rest of the districts, for the previous case it was the transformation. Expect this to be done in around half hour or so.
@FerranMarin,
Are you interested in and able to take the next steps to integrate the data into NepalMap? Your contribution would be welcome. If you do not have the time to do the next steps, I will invite others to contribute.
I'd love to, but I believe my workload at my actual work will increase in the next few weeks, if someone else is interested in catching up from where I left, that's ok with me. :)
Thanks for what you've done so far. I'll see if I or someone else can complete the integration of the data soon
Census data detailing religion per district is in Table 8 here, starting on page 278. Use these data to show the population by religion in each district.