associatedpress / geomancer

Open source tool to help journalists easily mash up data based on shared geography.
MIT License
59 stars 12 forks source link

Bureau of Labor Statistics data #27

Closed evz closed 9 years ago

evz commented 10 years ago

Relates to #17

The BLS data that seems to fit this model include:

These seem to be good for places (except for the one that I noted)

cathydeng commented 9 years ago

Some notes:

there is a DOL API (sampler) and a BLS API.

The BLS API is pretty annoying because you need to know the series ID in order to retrieve data (the series ID specifies the statistic, geography type, geography id, etc). These series IDs follow patterns (for example, here's the series ID pattern for occupational employment statistics) but there is no way to retrieve all series IDs associated with a dataset. I spent some time trying to generate series IDs (using the codes in the documentation) to retrieve data, but this was largely unfruitful b/c many IDs are invalid...this takes a lot of guess & check.

For the DOL API, I'm unable to access the tables as documented (here's the DOL API documentation & the 2010 Occupational Employment Statistics documentation). I'll keep poking around. I found a python wrapper for the DOL API - looking into this.

Also, the DOL has DOL 2010 Occupational Employment Statistics Dataset & BLS Occupational Employment Statistics Survey (OES) DATASET as two distinct datasets. Uncertain what the relationship is between the two. Looks like the latter only has data for 2013, so I'm not sure if they are just the same data but different years.

cathydeng commented 9 years ago

I haven't successfully retrieved any data w/ the DOL API - not sure if the documentation is up to date. Python wrapper also didn't work.

Went back to the BLS API and after some more guess & check, was able to get data for annual wages by state (median, 25 percentile, 75 percentile) from the occupational employment statistics data.

Some notes on the OES data:

cathydeng commented 9 years ago

here are the bls datasets: http://www.bls.gov/help/hlpforma.htm

derekeder commented 9 years ago

@tthibo any word back on what datasets would be useful for this mancer?

Here's the full list: ttp://www.bls.gov/help/hlpforma.htm

tthibo commented 9 years ago

Here are the ones that seem most immediately useful:

There are some geographies used by BLS that don't match anything we're currently using, and some of these values are available only for a tiny number of places. So, it's possible some of these won't work well as mancers.

cathydeng commented 9 years ago

unable to grab data (even the occupational employment data that I got before) from BLS right now (503 error). I'll try again later; hoping this is temporary...

cathydeng commented 9 years ago

didn't get 503 errors in the afternoon. some updates:

  1. occupational employment statistics
    this is what I already had in the bls mancer, w/ 2013 data. ran my code again and now 2013 data is unavailable, but 2014 data has been added, so I just changed the mancer to grab 2014 data. not sure why old data went missing...reached out to David Hiles at BLS & he's looking into it
  2. state and area employment
    some of the series (for example avg weekly hours) had no data available, but others (for example avg hourly earnings) did. also, there's only monthly data, not yearly averages, which might not be best for displaying in geomancer
  3. state and county employment and wages (from quarterly census of employment & wages)
    David Hiles directed me to http://www.bls.gov/cew/doc/access/csv_data_slices.htm - there are data slices by area, w/ unemployment & wage data (the BLS API only has about 5% of QCEW data b/c of design issues w/ their oracle publication database, but this will hold 100% of data going forward). this seems like it'd be great to add to geomancer, but I'm not sure how to get data slices by year instead of by quarter (I figure yearly will be better for geomancer). waiting to hear back from David
  4. local area unemployment statistics
    I'm able to get data, but it's monthly. QCEW will be a better source of unemployment data
  5. geographic profile
    I was able to get annual data by state, but data only exists for 1996, 1997, & 1998. probably not useful
  6. consumer price index
    this only exists by metropolitan area, which isn't a geography that we have defined in geomancer
cathydeng commented 9 years ago

tl;dr 1 is done, 3 is promising, 2/4/5/6 not so much

cathydeng commented 9 years ago

I've added annual data from the quarterly census of employment & wages to the BLS mancer: http://www.bls.gov/cew/doc/access/csv_data_slices.htm

These csv data slices were actually quite pleasant to work with - easier and better documented than the BLS API.

A note on the OES data I added before using the BLS API: apparently, when new OES data is uploaded, data from previous years is removed. this means that the code for grabbing OES data will break when 2015 data is uploaded next March, b/c I need to specify the year when grabbing data (it would take some hacking to just grab the data w/o specifying the year, figure out the year, and then update the column info).