Closed zaneselvans closed 5 months ago
I'll note here that the pudl_codes_datasources
table does include a pudl_version
column which appears to point at the right git hash:
I'm surprised that the pytest ... && touch $PUDL_OUTPUT/success
led to touch
ing if pytest
failed... going to see if the successful build from 2024-01-23 also had this behavior.
I think the logs you're looking at are for the wrong day, actually! 😌 See the timestamp:
2023-08-15 14:53:52 [ INFO] catalystcoop.pudl.validate:194 compiled_geometry_utility_eia861: found 247705 rows, expected 237872. Off by 4.134%, allowed margin of 0.000%
I can't find any test failures in the 2024-01-24
logs - instead I see test passes (albeit weirdly intertwined with some other tests bc of our multi-worker situation):
test/validate/service_territory_test.py::test_minmax_rows[compiled_geometry_balancing_authority_eia861-112853]
[gw3] [ 93%] PASSED test/validate/service_territory_test.py::test_minmax_rows[compiled_geometry_balancing_authority_eia861-112853]
test/validate/service_territory_test.py::test_minmax_rows[compiled_geometry_utility_eia861-248987]
[gw2] [ 93%] PASSED test/validate/plant_parts_eia_test.py::test_run_aggregations[eia_annual]
test/validate/bf_eia923_test.py::test_vs_bounds[eia_monthly-coal_heat_content]
[gw2] [ 93%] SKIPPED test/validate/bf_eia923_test.py::test_vs_bounds[eia_monthly-coal_heat_content]
test/validate/eia_test.py::test_unique_rows_eia[eia_monthly-bga_eia860-unique_subset1]
[gw3] [ 93%] PASSED test/validate/service_territory_test.py::test_minmax_rows[compiled_geometry_utility_eia861-248987]
In addition, it seems like 112853 is the new expected row count, which matches up with Datasette. So that's good too!
🤦🏼
Okay, so this was just a network hiccup in attempting to publish the data release to the Zenodo Sandbox, and actually everything is materially fine.
Data validation failure
The utility and balancing authority service territories derived from EIA-861 have significantly more records than expected.
nightly-2024-01-21
andnightly-2024-01-24
seem to be related to the PHMSA extraction.conda
environment was also re-locked on Monday. Maybe something there changed the behavior of the service territory assets?Build script success criteria:
nightly
branch was updated.success
file was created in the output bucket/nightly
outputs were updated with last night's outputs.run_pudl_etl()
funciton. Maybe we should break them up into individual commands each with their own success variable, and have the function return the AND of all of them?Datasette doesn't match?
Curious whether the datasette deployment had taken place given the above combination of failure and perceived success, I counted the rows in the two affected tables, and to my surprise, it matched neither the expected or observed row counts above:
ETL Logs:
pudl-etl.log