Bulk data access - Githubissues

censusreporter / census-api

The home for the API that powers the Census Reporter project.

MIT License

166 stars 50 forks source link

Bulk data access #59

Closed max-mapper closed 7 years ago

max-mapper commented 7 years ago

Hiya! Over at the Dat project we wanna feature the Census Reporter data as an example dataset. The goal is we could have all of your data exposed as a bulk data feed that analysts could easily pull into different research workflows.

I was wondering if you offer any sort of bulk data endpoint, such as how CA Civic Data has their bulk download ZIPs http://calaccess.californiacivicdata.org/downloads/latest/ that we can use to create a version controlled history of their whole archive.

We could also potentially do it through your API, but I don't see a way through the current API to get a changes feed we could subscribe to.

iandees commented 7 years ago

Hi Max! I'm happy to hear that Dat is interested in this.

Since we only get data from the Census twice a year, we haven't considered adding the concept of a change feed.

I could see dumping a CSV with all the rows and columns from the American Community Survey (our backing dataset). It would probably look a lot like the SQL dumps we have available here: http://censusreporter.tumblr.com/post/73727555158/easier-access-to-acs-data. Would that work for your usecase?

max-mapper commented 7 years ago

@iandees oh cool, those SQL files look great actually, don't worry about converting to CSV. Is there any additional cleaning of the data you do between those SQL files and what is served through the API?

iandees commented 7 years ago

Nope the SQL comes straight from the database that backs the API.

max-mapper commented 7 years ago

@iandees excellent, one last question, would downloading the 2015 5 year data contain the equivalent information as downloading the 2011, 2012, 2013, 2014 and 2015 1 year data?

max-mapper commented 7 years ago

weeeeeee

iandees commented 7 years ago

Amazon thanks you 😄 .

The 1 year releases cover different geographies and have some different columns than the 5 year releases. 1 year releases only contain data from the previous year's survey results so they are the most recent but only cover geographies with more than 65,000 population. The 5 year release cover the previous 5 years of survey results so it covers all geographies. Each year the survey changes a tiny bit so subsequent data releases can't always be compared.

You can read more about the release schedule in our glossary and on the Census "Which Release Should I Use" page.

JoeGermuska commented 7 years ago

Looks like we can close this...