Closed cmgosnell closed 2 years ago
I'm far from an expert on XBRL at this point, but I think I understand the basic concepts well enough to work towards extracting relevant data. Importantly, an XBRL instance is composed of facts
. A fact
is considered to be an atomic piece of data. It contains a value and all information needed to interpret that value (concept, unit, time period). A taxonomy then describes relationships between facts and provides some structure to the data.
Arelle seems to be the only particularly mature open source solution for interacting with XBRL. It was recently acquired by Workiva, and they claim it won't become proprietary. If this is true it could mean more support and consistent financial backing. Arelle provides a CLI, a GUI, and direct access through an API. No matter what method we use, it will most likely need to be scripted as Arelle is really not made for interacting with more than one filing at a time. For this reason, I think the API will probably be the most direct way to do this.
The API doesn't have much documentation, but digging around I think I've figured out enough to make use of it. I've figured out how to directly access the taxonomy, and the fact lists of individual filings. With access to both of these, I should be able to move forward with integrating the XBRL filings into the ETL.
There are many other tools for working with XBRL, but most of them are some combination of proprietary, targeted at helping companies doing filing, and focused on SEC data. XBRL-US seems to be the biggest player in the XBRL ecosystem, and they do provide several options for accessing FERC data, but only with a paid membership.
It seems that the easiest way to integrate new XBRL based data with the old Foxpro based data would be to develop a method for extracting data and mapping it to the SQLlite db created by ferc1_to_sqlite
. The taxonomy released for FERC form 1 very closely maps tables to the pages of the raw form, so it should be able to map this data to said SQLite db. As a starting place, I plan to try and implement this mapping for the tables currently being used by PUDL. From here we can attempt some data verification before working on the mapping for the rest of the tables.
Moving from scoping to integration
Research XBRL ecosystem, FERC's usage of XBRL, and develop a forward plan for integrating XBRL based FERC filings into PUDL.