Contains a scraper_utils.py file with helper methods for parsing and filtering the data from our data sources
check_status checks the status of our projects
Any projects with "Completed" or "Operational" are considered "Operational"
Projects that have the status "Cancelled" are not included in our json files
Any other projects are classified as "Proposed" for now
geocode_lat_long uses the Google Maps API Geocoding function to get the approximate latitude and longitude information based on a city and state. This function is used for the small solar project data from NYSERDA, which contains a field for city_town but not latitude/longitude
Created api/webscraper/nyserda_scraper.py file:
This function fetches data from the NYSERDA Large-scale Renewable Projects database and NYSERDA Statewide Distributed Solar Projects database.
Filters for specific fields in the data, excluding any projects that have a "Cancelled" status
Dumps the data into nyserda_large.json and nyserda_small.json files
Created api/webscraper/scraper.py file:
Makes a get request to the NYISO url to download a xlsx file
For now, we simply load the fetched bytes into a Pandas dataframe
NOTE: This function to parse and filter the data for the small solar projects NYSERDA data set makes calls to our Google Maps API! Don't run this file very frequently unless needed---you can also check the data already inside nyserda_large.json and nyserda_small.json.
To check if the scrapers for the NYSERDA data are working, you can run this command in your terminal:
python api/webscraper/nyserda_scraper.py
This will dump data into the nyserda_large.json and nyserda_small.json files inside the api/webscraper directory! If you want to see it in action, you can delete the data in there and run the command above and the json files should be repopulated.
If you run into any issues with dependencies, you may need to download certain python packages such as:
requests
json
python-dotenv
urllib
pandas
io
Run this command in your terminal:
pip install requests
for each of the needed dependencies!
Next steps
Download NYISO data using a webscraper to parse for the most up-to-date xlsx NYISO spreadsheet
What's new in this PR
Description
Created
api/webscraper/utils
module:scraper_utils.py
file with helper methods for parsing and filtering the data from our data sourcescheck_status
checks the status of our projectsjson
filesgeocode_lat_long
uses the Google Maps API Geocoding function to get the approximate latitude and longitude information based on a city and state. This function is used for the small solar project data from NYSERDA, which contains a field for city_town but not latitude/longitudeCreated
api/webscraper/nyserda_scraper.py
file:nyserda_large.json
andnyserda_small.json
filesCreated
api/webscraper/scraper.py
file:How to review
Standard procedure
NOTE: This function to parse and filter the data for the small solar projects NYSERDA data set makes calls to our Google Maps API! Don't run this file very frequently unless needed---you can also check the data already inside
nyserda_large.json
andnyserda_small.json
. To check if the scrapers for the NYSERDA data are working, you can run this command in your terminal:This will dump data into the
nyserda_large.json
andnyserda_small.json
files inside theapi/webscraper
directory! If you want to see it in action, you can delete the data in there and run the command above and the json files should be repopulated.If you run into any issues with dependencies, you may need to download certain python packages such as:
requests
json
python-dotenv
urllib
pandas
io
Run this command in your terminal:
for each of the needed dependencies!
Next steps
Relevant links
Online sources
Related PRs
CC: @itsliterallymonique