covega / enviro_papers

Take datasets on the environment and slot them into candidate specific research papers
MIT License
0 stars 0 forks source link

Pull in Clean Energy Jobs data #12

Closed schlosser closed 4 years ago

schlosser commented 4 years ago

Need to do some lightweight HTML scraping on all the HTML for various levels:

1. Start with the list of states, get the geoid. https://api.kevalaanalytics.com/geography/states/

2. For each state, get the list of geoids for all of the counties, senate districts, house districts, and congressional districts in the state: http://assessor.keva.la/cleanenergyprogress/geographies?state=56&type=XXX

Where XXX is one of: counties, legislativedistrictsupper, legislativedistrictslower, or congressionaldistricts.

3. For each entity, scrape the HTML and pull data: http://assessor.keva.la/cleanenergyprogress/analytics?area_type=XXX&area_id=YYY Where XXX is a type (see above) and YYY is a geoid. Example query. For each entity, pull the following data:

Output format should be a folder of CSV files (named like MA.csv, VA.csv), one per state. In each file, include the following columns:

We'll add those ~50 CSV files to data/cleaned/jobs/. Then, we'll write scripts to inject that data into the SQL database, but it will be easier to have all the data scraped and cleaned first.

saswat01 commented 4 years ago

can you specify what to scrape ?

schlosser commented 4 years ago

Ahh, realizing this is a bit different than I thought. Will update the description.

schlosser commented 4 years ago

Updated, please see above!

schlosser commented 4 years ago

Accidental close!

schlosser commented 4 years ago

This is done!