catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Some FERC filings share a ReportDate or CertifyingOfficialDate, but were published at different times. #2822

Closed jdangerx closed 8 months ago

jdangerx commented 1 year ago

For FERC forms 1, 2, 6, and 60, we expect each filing to include a ReportDate fact, and for FERC 714 we expect each filing to include a CertifyingOfficialDate fact.

We use these in the ferc-xbrl-extractor to order the filings by recency, so we can merge all the filings and use the most recent data we have for any given fact.

However, these only have day-level granularity, and often filings share the same date fact but are published at different times. We can tell because we can see the publish time with high granularity from the RSS feed metadata, which we already track.

To avoid ambiguity, we should use that RSS feed metadata instead of the report's self-reported date to determine which report should take precedence.

jdangerx commented 12 months ago

This causes data to get dropped when we are reading data from the FercXbrlSqliteExtractor, since it uses filing_name to join the data table with an ID table. If the ID table and the data table choose different filings from the same report date, then the data won't get found in that join.

jdangerx commented 8 months ago

Closed by https://github.com/catalyst-cooperative/ferc-xbrl-extractor/pull/151