catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
469 stars 108 forks source link

Integrate the new FERC 1 XBRL archive into the PUDL Datastore. #1667

Closed zschira closed 1 year ago

zschira commented 2 years ago

Background

In order to ingest XBRL data into PUDL, we need a datastore that can interpret XBRL archives (#1593). The archives consist of a set of XBRL filings, and some metadata pulled from the RSS feed, and stored in a JSON file. The metadata provides a list of filings (with additional info like the date-time the filings was submitted) submitted by an individual filer for a specified year and period. This is required because filers are able to resubmit filings at any point in time, so there may be multiple filings for filer for a specific year/period, and PUDL must know which filing to use.

Design

The datastore will open the metadata file, and find the most recent filing for every filer/year/period combo. We will assume that the most recent filing is the best one to process. It will then read this files into in-memory buffers which will be passed to the XBRL extractor.

cmgosnell commented 2 years ago

is this finished? or is it finished enough in the xbrl_integration branch

zaneselvans commented 1 year ago

This is so finished.