covega / enviro_papers

Take datasets on the environment and slot them into candidate specific research papers
MIT License
0 stars 0 forks source link

Pull in Clean Energy Jobs data #12

Closed schlosser closed 4 years ago

schlosser commented 4 years ago

Need to do some lightweight HTML scraping on all the HTML for various levels:

1. Start with the list of states, get the geoid.

2. For each state, get the list of geoids for all of the counties, senate districts, house districts, and congressional districts in the state:

Where XXX is one of: counties, legislativedistrictsupper, legislativedistrictslower, or congressionaldistricts.

3. For each entity, scrape the HTML and pull data: Where XXX is a type (see above) and YYY is a geoid. Example query. For each entity, pull the following data:

Output format should be a folder of CSV files (named like MA.csv, VA.csv), one per state. In each file, include the following columns:

We'll add those ~50 CSV files to data/cleaned/jobs/. Then, we'll write scripts to inject that data into the SQL database, but it will be easier to have all the data scraped and cleaned first.

saswat01 commented 4 years ago

can you specify what to scrape ?

schlosser commented 4 years ago

Ahh, realizing this is a bit different than I thought. Will update the description.

schlosser commented 4 years ago

Updated, please see above!

schlosser commented 4 years ago

Accidental close!

schlosser commented 4 years ago

This is done!