Integrate the new FERC 1 XBRL archive into the PUDL Datastore.

zschira commented 2 years ago

Background

In order to ingest XBRL data into PUDL, we need a datastore that can interpret XBRL archives (#1593). The archives consist of a set of XBRL filings, and some metadata pulled from the RSS feed, and stored in a JSON file. The metadata provides a list of filings (with additional info like the date-time the filings was submitted) submitted by an individual filer for a specified year and period. This is required because filers are able to resubmit filings at any point in time, so there may be multiple filings for filer for a specific year/period, and PUDL must know which filing to use.

Design

The datastore will open the metadata file, and find the most recent filing for every filer/year/period combo. We will assume that the most recent filing is the best one to process. It will then read this files into in-memory buffers which will be passed to the XBRL extractor.

cmgosnell commented 2 years ago

is this finished? or is it finished enough in the xbrl_integration branch

zaneselvans commented 1 year ago

This is so finished.

catalyst-cooperative / pudl

Integrate the new FERC 1 XBRL archive into the PUDL Datastore. #1667

Background

Design