associatedpress / geomancer

Open source tool to help journalists easily mash up data based on shared geography.
MIT License
59 stars 12 forks source link

research potential data sources to add #17

Open derekeder opened 10 years ago

derekeder commented 10 years ago

We should pick one of these to work on next, keeping in mind we want to build an extensible API wrapper that plugs in to Geomancer. This is a similar approach to what Open Civic Data does: http://opencivicdata.readthedocs.org/en/latest/scrape/index.html

derekeder commented 9 years ago

@evz lets pick one of these that is the most different from the CensusReporter API and work on incorporating it in to the Geomancer data sources

tthibo commented 9 years ago

PANDA would provide the greatest flexibility for the end user, but it may not be the best use case for developing the extensible API wrapper.

It does introduce issues the other APIs don't, though. For example, how do we identify data sets in PANDA that can be merged via Geomancer? (How can the PANDA user identify new data sets that should be made available to Geomancer without requiring a change to the API wrapper?)

derekeder commented 9 years ago

@tthibo PANDA may be the next best data source to integrate. However, we don't have a PANDA install available and it would take time to set one up. Does AP have one we could test with?

If not, the next best candidate would be USASpending, as the data comes in XML format, which we don't handle yet.

tthibo commented 9 years ago

The AP's PANDA install is behind the firewall. We can set one up on a world-facing server, but that would take some time, as you mention. In the meantime we could consider using their public demo: http://demo.pandaproject.net/#login

Or if it makes more sense to give USASpending a shot, I'm fine with that. I'll be honest, that was one recommended by a reporter, but I'm not quite clear on the use case for it. Does that API provide data aggregated at the geographical level, or does it only provide data at the contract level, available by state, for example?

evz commented 9 years ago

@tthibo Looks like you can get the contracts summarized by vendor location or by performance location. Locations can looked up by Congressional District, State, Zip Code, or City. There are varying degrees of detail that you can get back, the most general being totals by whatever your search criteria is.

Same things are true about the Federal Assistance and Federal Sub-awards endpoints.

That aside, I am also looking at the Bureau of Labor Statistics stuff. That might be another good case for integrating mainly because it's a multistep process to get to the numbers.

derekeder commented 9 years ago

@tthibo do you have a sense of which data sources would be the most valuable to add next?

Census: http://www.census.gov/developers/ BJS: http://www.bjs.gov/developer/ncvs/index.cfm Dept. of Labor (especially BLS): http://developer.dol.gov/ EPA: http://developer.epa.gov/

We have a decent start on BLS and could wrap that one up pretty quick. Any others you want us to investigate?

tthibo commented 9 years ago

Let's do BLS next, then. After that, I'd do BJS. I think decennial Census will be great to have, but since we already have ACS, it's a little less pressing. I do have other ideas, but those listed here would trump any additional sources.

cathydeng commented 9 years ago

The only API that BJS has listed in its data tools is the natl crime victimization survey (ncvs)

in the ncvs field descriptions (personal and household), the only geography in the data is region (i.e. Northeast, Midwest, South, West)

derekeder commented 9 years ago

Suggestion from the NICAR session - add country geotype and international data from the World Bank http://data.worldbank.org/

derekeder commented 9 years ago

Another suggesting from NICAR, OpenElections data: https://github.com/openelections

@zstumgoren would know something about this :smile_cat:

tthibo commented 9 years ago

We also had another vote for decennial Census at the NICAR session.