catalyst-cooperative / ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases
MIT License
11 stars 0 forks source link

Force tables to have all columns that are defined in schema #147

Closed jdangerx closed 9 months ago

jdangerx commented 9 months ago

In https://github.com/catalyst-cooperative/pudl/issues/2897 I found that we were missing some columns because the .unstack() in construct_dataframe doesn't create columns for values that don't show up at all, even if they're defined in the metadata. Applying a reindex makes sure we get everything.

This also was causing some integration test failures - when running the ETL in-process, we would:

  1. write 2021 data for a table
  2. construct dataframe for 2022 data, which has a slightly different column set because of different reported values
  3. fail when trying to write 2022 data to SQLite

Lastly, I wonder if there's a way we could keep our extracted tables tidy - our transforms in PUDL promptly re-stack these wide tables in wide_to_tidy, so maybe we can skip that completely. But that's definitely out of scope of this PR.

codecov[bot] commented 9 months ago

Codecov Report

All modified lines are covered by tests :white_check_mark:

Comparison is base (a29bee2) 93.09% compared to head (c987bd6) 93.09%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #147 +/- ## ======================================= Coverage 93.09% 93.09% ======================================= Files 8 8 Lines 594 594 ======================================= Hits 553 553 Misses 41 41 ``` | [Files](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/147?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative) | Coverage Δ | | |---|---|---| | [src/ferc\_xbrl\_extractor/datapackage.py](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/147?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative#diff-c3JjL2ZlcmNfeGJybF9leHRyYWN0b3IvZGF0YXBhY2thZ2UucHk=) | `98.70% <ø> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jdangerx commented 9 months ago

Yeah, I think we plant our mental seeds now and then reap them when we have to integrate 2023 data 🌱