This sprint, you will be connecting the data from the NYISO, NYSERDA to populate the database! For this, please make a separate python file and keep in mind that we will be adding the data we scraped from ORES to this function.
Here is what you will need to do:
Create a file that will hold the function(s) you will use to connect the NYISO and NYSERDA data by project name and/or interconnection_queue_number
Adjust your NYSERDA web scraper to also collect interconnection_queue_number
Create a function that connects all the NYISO and NYSERDA data by the project name and/or interconnection_queue_number and add it to a json / csv file (whichever you prefer!)
a. When you do this, check if the NYISO has any data that can help populate any NULL columns in the data we scraped from the NYSERDA
b. If there is a project that is not in the NYISO (or vice versa), still add it to the dataset.
c. We will need to assign projects with the State Senate and Assembly Districts. Here’s how you can find the districts:
Find an API that can use address (you can get this from google maps reverse geocoding) to find the state senate and assembly districts (a free API)
We can add to our database the counties in NY + their corresponding state senate and assembly districts then use the data to assign the districts based on the project’s county
Make a new test table on Supabase based on our “Projects” table and try to write all the data into it (to check your web scraper works so far!)
For now keep key_development_milestones and image NULL (I’m still sorting some details out with these so you will add them in a later sprint!)
Based on the table from "what the frack is a webscraper," should we just leave out renewable energy fields that are not one of the highlighted ones?
How should we deal with projects that are missing a zipcode and therefore do not have a viable latitude/longitude to use?
For NYISO:
NYISO spreadsheet doesn't have a field for project status --> are we just assuming it's all "Proposed"?
Other NYISO missing info: region, zipcode --> might be able to use Points of Interconnection column to estimate lat/long?
I think we need to define what fields we want to merge/override from the NYISO data --> we can't just override all the existing fields with the NYISO data since it's missing so much, should I parse the database's project if it exists to check for missing fields, and only replace those? or do we want to pre-define certain fields we always use from the NYISO set?
This sprint, you will be connecting the data from the NYISO, NYSERDA to populate the database! For this, please make a separate python file and keep in mind that we will be adding the data we scraped from ORES to this function. Here is what you will need to do:
Some resources to help!
PR Reviewer: @itsliterallymonique and @ethan-tam33
approval
field set to false by default