Open cmgosnell opened 1 year ago
i figured out why this was happening... in the pudl.glue.ferc1_eia.get_plants_ferc1_raw
, I was grabbing only the most recent year of data, as oppose to sorting the data by year and then dropping duplicates. I already fixed this in the eia plant-getter so I'm going to implement the same solution
Current code:
# grab the most recent plant record
most_recent_year = max(all_plants.report_year)
all_plants = (
all_plants
.loc[
(all_plants.report_year == most_recent_year),
[
"utility_id_ferc1",
"utility_name_ferc1",
"plant_name_ferc1",
"utility_id_ferc1_dbf",
"utility_id_ferc1_xbrl",
"capacity_mw",
"report_year",
"plant_table",
],
]
.drop_duplicates(["utility_id_ferc1", "plant_name_ferc1"])
.sort_values(["utility_id_ferc1", "plant_name_ferc1"])
)
@cmgosnell this was successfully fixed right? Should we close this issue?
Describe the bug
A handful of plants from the hydro table snuck through the cracks of the glue tests and cause a FK failure when loading the full db.
Bug Severity
How badly is this bug affecting you?
""
into the invalid values forplant_name_ferc1
). But this means the glue tests aren't actually getting all of the plants.To Reproduce
Remove the last three lines from
pudl_id_mapping.xlsx
and runpytest test/integration/glue_test.py --live-dbs --save-unmapped-ids
ondbf-xbrl-mapping-dupes
(orxbrl_integration
once #2067 is merged). This should result in the failure of that test, but right now it passes.