OpenDataCo / awesome-datasets-colombia

Lista de enlaces a datasets relacionados con Colombia
27 stars 0 forks source link

Nice collection: data for municipals? #1

Open Deleetdk opened 7 years ago

Deleetdk commented 7 years ago

Thanks for a nice collection of data!

I research social inequality in American countries (among other places) and have previous done a study on Colombia using department-level data. This raised issues of sample size because there are only 32 departments and a capital district.

However, there seems to be about 1,100 municipals (municipio), thus enabling a much richer study.

I'm wondering if it would be possible to collect a dataset for municipals with the following information:

For all variables, it is best to average over a few years to obtain more reliable estimates.

My limitation is that I don't read Spanish well (I'm from Denmark!). If someone could help me put together such a dataset, I would be happy to send some money their way (something like 200 USD) and of course share the dataset publicly for free. :)

dav009 commented 7 years ago

Lots of informaiton. I will try to address each issue:

Check in the mailing list

Other websites

dav009 commented 7 years ago

@Deleetdk on the water maps, stats, data please check with hyances[AT]gmail he is working with openstreetmaps to solve various problems regarding water access for communities away from urban centers

dav009 commented 7 years ago

Just brainstorming here, your best option on your money bet , might be to approaching someone studying sociology/social sciences at a colombian university :[

Deleetdk commented 7 years ago

@dav009 thanks for the replies. In my experience, the easiest way to find a lot of data fast is to use methods like these. Very briefly, use google search with tricks. For instance, in this case we use something like:

municipio site:co filetype:xls OR filetype:xlsx

This quickly locates lots of files with municipio level data. For instance, the first file

https://www.dane.gov.co/files/investigaciones/boletines/censo/DeficitViviendaCenso2005.xls

has all the links between municipios and departments, number of households and some household quality data (not sure about the exact translation).

The next file has some unsatisfied basic needs data.

https://www.dane.gov.co/files/censos/resultados/NBI_total_cab_resto_mpio_nal_31dic08.xls

This file is probably the most comprehensive:

https://colaboracion.dnp.gov.co/CDT/Desarrollo%20Social/IPM%20por%20municipio%20y%20dpto%202005%20(Incidencias%20y%20Privaciones_F).xls

So, with these, I have a bunch of socioeconomic variables, population.

Still need SABER/ICFES:

http://www2.icfes.gov.co/docman/instituciones-educativas-y-secretarias/saber-11/resultados-saber11/670-resultados-agregados-puntajes-promedio-saber-11-2014-2-por-institucion-educativa http://www2.icfes.gov.co/docman/instituciones-educativas-y-secretarias/saber-11/resultados-saber11/671-resultados-agregados-puntajes-promedio-saber-11-2015-1-por-institucion-educativa

The first has a lot of data for 2014. Rows ≈ 12k. Not sure what is what. The second has data for most municipio for 2015, rows ≈ 960.

Better perhaps to use the SABER case-level datasets. These have n≈550k. The case-level SABER dataset from 2014 seems to have data from almost all municipio, n=1024. Same with 2013 data. Good, we have that covered then.

Geographical data:

http://www.ideam.gov.co/documents/21021/553571/Promedios+Climatol%C3%B3gicos++1981+-+2010.xlsx/f28d0b07-1208-4a46-8ccf-bddd70fb4128

This has: elevation, latitude, longitude, precipitation, days with rain, temperature (mean, max, min), sun hours, humidity, and a few more. Great!

I was unable to find race/ethnicity data using keywords like "raza" and "etnia". But SABER datasets have race/ethnicity. If we can assume the students are representative within each municipio, then we can aggregate within each municipio and estimate the race/ethnicity proportions.

This covers just about everything, except for some of those natural resources. It's not a requirement, just nice to have.

The primary thing I will need help with, then, is some translation help. I will use Google Translate, but sometimes, the translation is unclear and one needs a Spanish speaker.

demorenoc commented 7 years ago

You can find raw (student level) data since 2005 for the SABER standardized tests in this R data package: https://github.com/nebulae-co/saber. On request, ICFES provides an FTP connection for researchers to access data, but it can take a while and can be messy so we packaged it. Also, map polygons at the municipal level in https://github.com/nebulae-co/colmaps.

Deleetdk commented 7 years ago

Very nice. :)