catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Add the FK relationship to the new EIA860 storage direct support plant/generators #3695

Open cmgosnell opened 6 days ago

cmgosnell commented 6 days ago

There are new sets of plant_id's and generator id's in the core_eia860__scd_generators_energy_storage table which are all about whether a storage generator supports another generator. Goal of this issue is to add these new id's into special FK relationships. It is possible that we will need to harvest these new ID's in order to actually have all of the ID's in the harvested generators table. Another possible barrier is that there definitely nulls in these columns. Is it okay to have nulls in a column with a foreign key relationship to a table that doesn't have any nulls?

These are the columns at question:

"plant_id_eia_direct_support_1",
"generator_id_direct_support_1",
"plant_id_eia_direct_support_2",
"generator_id_direct_support_2",
"plant_id_eia_direct_support_3",
"generator_id_direct_support_3",

Running pudl_check_fks without the new alembic schema produces the following errors:

pudl.etl.check_foreign_keys.ForeignKeyErrors: Foreign key error for table: core_eia860__scd_generators_energy_storage -- core_eia__entity_generators (plant_id_eia, generator_id, plant_id_eia, generator_id, plant_id_eia, generator_id) -- on rows [1523 1548 1576 1589 1596 1727 1728 1795 1796 1827 1864 1865 1879 1892
 1895 1909 1920 1922 1934 1943 1946 1977 2028 2046 2059 2069 2088 2101
 2102 2105 2108 2109 2110 2111 2112 2116 2125 2126 2133 2143 2224 2236
 2315 2316 2317 2408 2448]

Which have the following values:

plant_id_eia_direct_support_1 generator_id_direct_support_1 plant_id_eia_direct_support_2 generator_id_direct_support_2 plant_id_eia_direct_support_3 generator_id_direct_support_3
1523 NaN None NaN None NaN None
1548 NaN None NaN None NaN None
1576 NaN None NaN None NaN None
1589 NaN None NaN None NaN None
1596 NaN None NaN None NaN None
1727 62975.0 SYNLB NaN None NaN None
1728 NaN None NaN None NaN None
1795 63490.0 WIL NaN None NaN None
1796 NaN None NaN None NaN None
1827 64185.0 DRAPV NaN None NaN None
1864 64436.0 WLB NaN None NaN None
1865 NaN None NaN None NaN None
1879 64520.0 SOL1 NaN None NaN None
1892 NaN None NaN None NaN None
1895 NaN None NaN None NaN None
1909 NaN None NaN None NaN None
1920 64852.0 DAGGB 64852.0 DAGGP NaN None
1922 NaN None NaN None NaN None
1934 NaN None NaN None NaN None
1943 65055.0 77FPV NaN None NaN None
1946 65058.0 WAIBA 65058.0 WAIPV NaN None
1977 NaN None NaN None NaN None
2028 66076.0 20409 NaN None NaN None
2046 NaN None NaN None NaN None
2059 NaN None NaN None NaN None
2069 NaN None NaN None NaN None
2088 67457.0 SHPV1 NaN None NaN None
2101 61169.0 60798 NaN None NaN None
2102 NaN None NaN None NaN None
2105 NaN None NaN None NaN None
2108 61720.0 30 NaN None NaN None
2109 61722.0 32 NaN None NaN None
2110 61752.0 49 NaN None NaN None
2111 61807.0 66 NaN None NaN None
2112 62494.0 SCSOL NaN None NaN None
2116 NaN None NaN None NaN None
2125 63541.0 63257 NaN None NaN None
2126 NaN None NaN None NaN None
2133 NaN None NaN None NaN None
2143 NaN None NaN None NaN None
2224 NaN None NaN None NaN None
2236 NaN None NaN None NaN None
2315 65550.0 64921 NaN None NaN None
2316 65550.0 65789 NaN None NaN None
2317 65550.0 None NaN None NaN None
2408 NaN None NaN None NaN None
2448 67282.0 5654 NaN None NaN None
### Tasks
- [x] Add new `foreign_key_rules` into `core_eia__entity_generators` for each of  these direct support column names for the plant/gen ids. (examples of other tables that do this are: `core_eia__entity_utilities`, `core_eia860__scd_utilities`)
- [x] Regenerate the alembic schema
- [x] test the relationships w/ `pudl_check_fks`
- [x] fix plant ID dtypes
- [x] Rule out incorrectly written string nulls in SQLite DB by directly querying in DB Browser
- [ ] debug NaNs / Nones failing the FK check
e-belfer commented 4 days ago

When running just on the direct_support_1 and with the field type updated for the plant IDs:

[1523 1548 1576 1589 1596 1727 1728 1795 1796 1827 1864 1865 1879 1892 1895 1909 1920 1922 1934 1943 1946 1977 2028 2046 2059 2069 2088 2101 2102 2105 2108 2109 2110 2111 2112 2116 2125 2126 2133 2143 2224 2236 2315 2316 2317 2408 2448]

e-belfer commented 4 days ago

Adding these IDs to the harvesting process seems to have fixed the issue, and pudl_check_fks is now passing on #3699.