CodeforNepal / nepalmap_app

An application that maps census and other official data for Nepal to make data more easily accessible and understandable to the public. Want to help us? Check out the Wiki.
https://nepalmap.org
MIT License
63 stars 42 forks source link

Identify health-related data sets for inclusion #150

Open cliftonmcintosh opened 7 years ago

cliftonmcintosh commented 7 years ago

Open Nepal has many health-related data sets. Identify ones that might be good to include. Create an issue for each one and link the issue to the actual data set. Data sets that work well:

Examples of potentially good data sets include:

These are potential samples. Once data sets have been identified and issues have been created, then the team can prioritize which issues would or would not be valuable to include.

amitness commented 7 years ago

I'm participating in "Open Data Day Hackathon 2017" and we've been provided few datasets. One of them includes immunization dataset by district. Will this be helpful? @cliftonmcintosh

Immunization dataset by district last 2 year.csv.zip

cliftonmcintosh commented 7 years ago

I took a quick look, and it looks promising. We would have to understand what all the columns mean in order to make sense of it. I am not sure how we map to overall population numbers. We can probably use data like this even if we can't match to population numbers.

On Fri, Feb 24, 2017 at 10:42 AM Amit Chaudhary notifications@github.com wrote:

I'm participating in "Open Data Day Hackathon 2017" and we've been provided few datasets. One of them includes immunization dataset by district. Will this be helpful? @cliftonmcintosh https://github.com/cliftonmcintosh

Immunization dataset by district last 2 year.csv.zip https://github.com/Code4Nepal/nepalmap_app/files/799930/Immunization.dataset.by.district.last.2.year.csv.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Code4Nepal/nepalmap_app/issues/150#issuecomment-282339521, or mute the thread https://github.com/notifications/unsubscribe-auth/ADpbbF5tHcHej7rZYk9vO9Ex5kuUhl8wks5rfwhlgaJpZM4MGkLR .

cliftonmcintosh commented 7 years ago

@ravinepal

As per our discussion via email, I have been looking at health data more closely. OpenNepal has quite a few data sets, but one of my concerns is that some of them are several years old. I have been looking at the most recent Annual Report from the Department of Health Services (available here as a PDF). This is a more recent version of the data sets that OpenNepal has digested. I believe we can extract the data from the tables in that report using Tabula and this will provide us with more recent data. I have extracted a few data sets this way. They need manipulating to convert them into a usable format, but I believe it will be worth the effort. Right now I have done the preliminary extraction for several of the tables in the "Safe Motherhood" section. These include data on:

The data sets need more processing, and it may be that not all of them are valuable, but I think there is a lot we can mine from the document.

It would be nice if team members could look through those tables and see if they think see some data points that might be important to show.

ravinepal commented 7 years ago

thanks, @cliftonmcintosh! should i reach out to open nepal team to see if they can extract these datasets? (responded to your email as well.)

cliftonmcintosh commented 7 years ago

@ravinepal

Thanks for offering to reach out to Open Nepal for extracting the data, but I would like to try my hand at it for a couple of datasets first. This will allow me to convert the data in a way that is useful for NepalMap. Moving to a format that is useful for us from the format delivered by theTabula PDF converter is likely to be no more difficult than moving from the way OpenNepal presents the data.

ravinepal commented 7 years ago

sounds good, @cliftonmcintosh! @amitness has extracted some of census data in the past - so looping him to see if he can advise/help as well

amitness commented 7 years ago

@ravinepal @cliftonmcintosh Tabula is the best way to go. There is this useful wrapper for tabula in python called tabula-py. Also here is the Example on using it.

cliftonmcintosh commented 7 years ago

@amitness Thanks for the tips