Closed aesharpe closed 2 years ago
"I am certain that this new XBRL data will make that instantaneous" 🤣
I find it hard to imagine that they are going to retroactively apply any kind of structure onto 10 years of incredibly messy data that is impossible to parse programmatically, so I imagine that at best the new data going forward will be clean, the last 10 years of messy data will be available in XBRL, and the 17 years of data before that will only be available through Visual FoxPro. So I suspect that whatever cleaning we're doing for the years up to 2020 will remain relevant.
Ya, I actually responded asking what he meant by "instantaneous" and he said:
"Because our data is now in XBRL – a standard for data – with the right tools and knowledge, it can easily be linked to just about any other dataset."
And then I asked him whether they had an EIA crosswalk and he said no....
Some other useful XML / XBRL links I've come across so we don't lose them (XBRL is a particular flavor of XML):
Having spent a few weeks looking at the SEC DERA data (which originates in XBRL), the big caveat I would offer is that there seems to be very little foreign key enforcement. I've had some exchanges with the Structured Data Office (of the SEC) and they informed me about the public channel they use to comment on data quality issues they observe: https://www.sec.gov/structureddata/osdstaffobsandguide
It's all well and good to have a syntactic validator that ensures that files are parseable. But what will be very important is keeping an appropriate tight leash on how XBRL submissions remain within the guidelines of the data model and taxonomies, and that we don't see a flowering of 1000 different descriptions of the same fundamental dat type (which, though discouraged, is permissible in the SEC's world, and readily observed). For example, in the first quarter of each calendar year, over 4000 companies report their market cap (public float), and another 1000+ disclose in the other three quarters:
bash-3.2$ grep -c EntityPublic 2020q?/num.txt
2020q1/num.txt:4171
2020q2/num.txt:875
2020q3/num.txt:475
2020q4/num.txt:457
But one company reports EntitysPublicFloat:
bash-3.2$ grep EntitysPublic 20??q?/num.txt
2020q4/num.txt:0001213900-20-034148 EntitysPublicFloat 0001213900-20-034148 20200630 0 BRL 59342000.0000
2020q4/num.txt:0001213900-20-034148 EntitysPublicFloat 0001213900-20-034148 20190630 0 BRL 53802000.0000
Which, though permitted, is actually erroneous, and would be caught with proper validation.
Arelle seems to be the de facto (as much as there is one) open source standard for working with XBRL. FERC also provides plugins for rendering and validation using Arelle.
Arelle also provides a plugin for conversion to a SQL database. It does the conversion in a highly generalized way that seems difficult to work with, but this may be a route we could take. Arelle does also have a python api that could be useful, but the documentation is sparse.
FERC has updated it's filing practices! Now, instead of using FoxProDB, they are using XBRL files. They've dumped a bunch of the old files into this format, so I think it makes sense to explore those and figure out how to read them so we're ready for next year when there is no more FoxPro.
From an email with Robb Hudson from FERC: Contact: Robert.Hudson@ferc.gov
From an email with David Tauriello from XBRL.us: Contact: david.tauriello@xbrl.us