deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

Gridstatus iso queues extraction and exploration #296

Closed bendnorman closed 8 months ago

bendnorman commented 1 year ago

This PR creates a function for caching and extracting grid status interconnection queue data.

I also upgraded:

bendnorman commented 1 year ago

@TrentonBush, notebooks/37-bdn-explore-gridstatus-data.ipynb is a good starting place for interacting with the archived gridstatus data.

The archives are somewhat up to date. The combination of flaky ISO APIs and gridstatus basic pandas cleaning becoming out of date make the archiving a little flaky :/

bendnorman commented 10 months ago

@TrentonBush I did some basic cleaning of the grid status data. A few of thoughts:

TrentonBush commented 9 months ago

While trying to write transformed results to a .parquet file, I ran into an issue of mixed types in the output.:

d = df.convert_dtypes().dtypes
obj_cols = d[d.eq('object')].index
for c in obj_cols:
    types = gs_dfs_dec17[c].apply(type).unique()
    print(f"{c}:\t\t{types}\n")

output:

proposed_completion_date:       [<class 'datetime.datetime'> <class 'pandas._libs.missing.NAType'>
 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
 <class 'pandas._libs.tslibs.nattype.NaTType'>]

queue_date:     [<class 'datetime.datetime'>
 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
 <class 'pandas._libs.tslibs.nattype.NaTType'>]

queue_id:       [<class 'str'> <class 'int'>]

summer_capacity_mw:     [<class 'float'> <class 'pandas._libs.missing.NAType'> <class 'str'>]

winter_capacity_mw:     [<class 'float'> <class 'pandas._libs.missing.NAType'> <class 'str'>]

withdrawal_comment:     [<class 'pandas._libs.missing.NAType'>]
bendnorman commented 9 months ago

Just pushed a change that should correct the mixed dtype issue.

bendnorman commented 9 months ago

Ok! I realized by filtering by "Status" == "Active" I was removing a lot of projects that LBNL considered "active" and "nearly certain". Now I have some logic in each ISO cleaning function to filter out in service and withdrawn projects. I also normalized the data into projects and capacities. @TrentonBush your normalize_multicolumns_to_rows function was life saver thank you!

With these changes the overall capacity differences look pretty good for each ISO. I think the variability can be explained by the 12 month delay for SPP and ISONE and the 4 month delay for all other ISOs. There are a couple of remaining issues that I uncovered using notebook 42:

TrentonBush commented 8 months ago

OK I finally added the deduplication and fixed the output tests. I think this is ready to go?

bendnorman commented 8 months ago

Yes! Let's merge it in.