deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

Integrate gridstatus for near realtime ISO Queue data #267

Open bendnorman opened 1 year ago

bendnorman commented 1 year ago

Tasks

Questions

Scope

Minimum viable scope:

Things to consider:

Integration

Validtation

TrentonBush commented 1 year ago

The scope and work breakdown sounds right to me!

Archiving

I think if Catalyst has a bigger interest in archiving then it makes sense for Catalyst to host the archiving code and runner. That way Catalyst can extend the archive to data beyond the ISO queues or update at higher frequencies if desired. The DBCP code can just pull from Catalyst archives as if it was any other public source.

I imagine we'd want to archive different datasets separately for the availability reasons you outlined above (like if NYISO fails one particular day). I don't think there is anything tying the ISO vintages together, right? Like we could pull today's CAISO data and last week's NYISO data if we thought there was something wrong with the latest NYISO release.

Processing

I think the ETL code should be in the DBCP repo so it can focus on the specific needs of this project. At first the DBCP code can pull from pinned vintages of ISO data that is manually validated. Then we can add auto-update logic to fetch the latest and greatest versions.

Update Frequency

Considering some (most?) ISO queues only update monthly, I would guess that this repo doesn't need daily updates. But if we can safely update daily, then sure, let's do it. My only concern is that the higher the update frequency the higher the degree of automation we need for data validation or we risk breaking stuff downstream. I'm not sure yet how big a lift that is. We'd have to make very strict checks and auto-update only if they pass, else revert and flag for manual review. I'd guess we could start with manual updates ~monthly until we build up a library of automated checks, then transition to higher frequency.

TrentonBush commented 1 year ago

Validation

I agree that we need to compare the Grid Status data to the existing LBNL version. I think the ISO queues include both a queue entry and exit date, so hopefully we can filter based on those and reproduce the LBNL version. If not, we'll have to do some error analysis and figure out how/why they differ and whether we can live with it.

Joining with LBNL

The LBNL data includes a column with the source ISO (or non-ISO), so I expect we can simply filter for only non-ISO data and combine it with the latest ISO data from Grid Status.

bendnorman commented 1 year ago

Archiving

Processing

Frequency

TrentonBush commented 1 year ago

Oh are the Catalyst archives not public or "requester pays"? I'd expect this data to be < 5MB uncompressed.

bendnorman commented 1 year ago

I think we'll want to keep these archives private for now.