deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

GS and LBNL comparison #311

Open bendnorman opened 4 months ago

bendnorman commented 4 months ago

This PR

  1. Adjust the ETL to work with our oldest GS archives. This should probably not be merged into dev or main but we should keep the branch around incase we need to compare older versions.
  2. Does a simple comparison of total clean capacity between LBNL and GS data using the data warehouse tables.
TrentonBush commented 3 months ago

I like the idea of being able to look at different vintages of data. It doesn't look like too much had to change; just the file IDs and some validation checks. Do you think it would be a big lift to add functionality to select gridstatus data by (nearest) date?

TrentonBush commented 3 months ago

As for the comparison itself, this makes me really wish GS had integrated withdrawn/completed projects. Then the time difference wouldn't matter. Alas.

I can't make inline comments in a notebook, so here are a few notes:

TrentonBush commented 3 months ago

FYI ella did something similar for comparing GS data across time. You don't need to read all the code, I'm just sending the visuals as examples

bendnorman commented 3 months ago

Ella's notebook is helpful thank you!

Gridstatus does track withdrawn and operational projects but we are filtering for active projects only. Why would including withdrawn and operational projects resolve the time difference?

I'll work on a map/analysis that compares county-level information. I feel like we need this LBNL - GS comparison at multiple points, raw data, data warehouse and mart tables to understand where the differences between the two datasets emerge.

TrentonBush commented 3 months ago

If GS included withdrawn/operational projects then we could still compare the projects across fuel type, capacity, location. Every field except status. Most ISOs have a field that says when an item entered the queue, so we could remove projects from after the LBNL time window.

Gridstatus does track withdrawn and operational projects but we are filtering for active projects only.

If I remember correctly GS only have it for a couple of ISOs, and withdrawn projects weren't always in their selection of raw data. I believe all ISOs make that available but we'd have to find the primary sources and ETL it.

I agree it would help to compare LBNL - GS at multiple points to ensure the processing is ~equivalent