Track publication time in primary key & stop deduping rows

jdangerx commented 9 months ago

For each table, we'll now have a row per context per filing, which includes all the relevant facts reported in that context.

In 0.8.3 and below, we relied on the caller to only request one filing per entity.

In 1.1.1 and below, we read in all the filings for an entity, but then deduplicated the table so that each fact would only be reported once. We did this by sorting by the report date, and then picking the last reported value for each fact.

Unfortunately, since report date is not granular enough, this lead to an ambiguous sort order, which led to some issues matching data between tables since they would be associated with different filing names.

We will stop trying to tinker with the data here, beyond bringing it into SQLite form, and do the deduplication within PUDL. That means we will include duplicate facts whenever there are multiple filings reporting the same fact.

jdangerx commented 9 months ago

Nooo the published_parsed field is in EDT and the runners run on UTC.

Someday we need to update the archiver to put the published_parsed in UTC, but until now we will explicitly use UTC-4.

jdangerx commented 9 months ago

Nooo the published_parsed field is actually in Eastern Time, which may or may not be Eastern Daylight Time. Updating to force America/New_York.

codecov[bot] commented 9 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (2ddcc8c) 93.29% compared to head (bdd2b4b) 93.36%. Report is 3 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #151 +/- ## ========================================== + Coverage 93.29% 93.36% +0.06% ========================================== Files 8 8 Lines 597 603 +6 ========================================== + Hits 557 563 +6 Misses 40 40 ``` | [Files](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/151?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative) | Coverage Δ | | |---|---|---| | [src/ferc\_xbrl\_extractor/cli.py](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/151?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative#diff-c3JjL2ZlcmNfeGJybF9leHRyYWN0b3IvY2xpLnB5) | `94.54% <ø> (ø)` | | | [src/ferc\_xbrl\_extractor/instance.py](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/151?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative#diff-c3JjL2ZlcmNfeGJybF9leHRyYWN0b3IvaW5zdGFuY2UucHk=) | `94.11% <100.00%> (+0.41%)` | :arrow_up: | | [src/ferc\_xbrl\_extractor/xbrl.py](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/151?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative#diff-c3JjL2ZlcmNfeGJybF9leHRyYWN0b3IveGJybC5weQ==) | `90.90% <100.00%> (-0.66%)` | :arrow_down: | | [src/ferc\_xbrl\_extractor/datapackage.py](https://app.codecov.io/gh/catalyst-cooperative/ferc-xbrl-extractor/pull/151?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative#diff-c3JjL2ZlcmNfeGJybF9leHRyYWN0b3IvZGF0YXBhY2thZ2UucHk=) | `99.35% <83.33%> (+<0.01%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

catalyst-cooperative / ferc-xbrl-extractor

Track publication time in primary key & stop deduping rows #151

Codecov Report