Gridstatus iso queues extraction and exploration

deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund

https://www.deploymentgap.fund/

MIT License

6 stars 2 forks source link

Gridstatus iso queues extraction and exploration #296

Closed bendnorman closed 8 months ago

bendnorman commented 1 year ago

This PR creates a function for caching and extracting grid status interconnection queue data.

I also upgraded:

pandas-gbq~=0.19.1
pydata-google-auth~=1.7.0

bendnorman commented 1 year ago

@TrentonBush, notebooks/37-bdn-explore-gridstatus-data.ipynb is a good starting place for interacting with the archived gridstatus data.

The archives are somewhat up to date. The combination of flaky ISO APIs and gridstatus basic pandas cleaning becoming out of date make the archiving a little flaky :/

bendnorman commented 10 months ago

@TrentonBush I did some basic cleaning of the grid status data. A few of thoughts:

Most of the columns in the LBNL data are shared amongst all of the ISOs in grid status so my cleaning focused on those columns. The notable exception is the detailed status information. it seems like each ISO reports this information in its own way so we'll probably need create separate cleaning functions for each iso instead of cleaning a concatenated version like I'm doing now. Would love your assistance for this part of the cleaning since you're more familiar with this! Map straight to synapse column instead of going to LBNL?
I mapped the technology types and would love for you to double check I made sensible decisions.
Any idea why LBNL created one to many relationships between projects, capacity and location?
To fully integrate GS with LBNL I'm imagining we'll create GS data warehouse tables, then adjust the county_wide / long data mart tables to pull non iso data from LBNL and ISO data from GS. What do you think?
It looks like our LBNL code does some manual cleaning and deduplication. How can we determine if this need to be applied to the GS data?

TrentonBush commented 9 months ago

While trying to write transformed results to a .parquet file, I ran into an issue of mixed types in the output.:

d = df.convert_dtypes().dtypes
obj_cols = d[d.eq('object')].index
for c in obj_cols:
    types = gs_dfs_dec17[c].apply(type).unique()
    print(f"{c}:\t\t{types}\n")

output:

proposed_completion_date:       [<class 'datetime.datetime'> <class 'pandas._libs.missing.NAType'>
 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
 <class 'pandas._libs.tslibs.nattype.NaTType'>]

queue_date:     [<class 'datetime.datetime'>
 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
 <class 'pandas._libs.tslibs.nattype.NaTType'>]

queue_id:       [<class 'str'> <class 'int'>]

summer_capacity_mw:     [<class 'float'> <class 'pandas._libs.missing.NAType'> <class 'str'>]

winter_capacity_mw:     [<class 'float'> <class 'pandas._libs.missing.NAType'> <class 'str'>]

withdrawal_comment:     [<class 'pandas._libs.missing.NAType'>]

bendnorman commented 9 months ago

Just pushed a change that should correct the mixed dtype issue.

bendnorman commented 9 months ago

Ok! I realized by filtering by "Status" == "Active" I was removing a lot of projects that LBNL considered "active" and "nearly certain". Now I have some logic in each ISO cleaning function to filter out in service and withdrawn projects. I also normalized the data into projects and capacities. @TrentonBush your normalize_multicolumns_to_rows function was life saver thank you!

With these changes the overall capacity differences look pretty good for each ISO. I think the variability can be explained by the 12 month delay for SPP and ISONE and the 4 month delay for all other ISOs. There are a couple of remaining issues that I uncovered using notebook 42:

Overall capacities look good for all the ISOs
Same number of ISONE projects are marked actionable in old and new data but the MW distributions don’t line up.
Significantly more projects marked actionable in new CAISO and MISO data
Significantly less number of projects marked nearly certain in new MISO data.

TrentonBush commented 8 months ago

OK I finally added the deduplication and fixed the output tests. I think this is ready to go?

bendnorman commented 8 months ago

Yes! Let's merge it in.