catalyst-cooperative / ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases
MIT License
13 stars 1 forks source link

XBRL Extractor badzipfile error #251

Open broscious4peg opened 3 months ago

broscious4peg commented 3 months ago

I am encountering a badzipfile error around the taxonomy file for FERC form 1. I am using the taxonomy file from the FERC website: https://ecollection.ferc.gov/taxonomyHistory

Please let me know ASAP if you have any comments or ideas, and we can get to talking!

Error:

C:\Users\PEG Intern>xbrl_extract "C:\Users\PEG Intern\downloads\Puget Sound Files" --db-path "ferc1-2021-sample.sqlite" --taxonomy "C:\Users\PEG Intern\Downloads\Form 1_2023-04-01_976 (1).zip" 2024-08-01 15:18:27 [ INFO] catalystcoop.ferc_xbrl_extractor.xbrl:247 Parsing taxonomy from Form 1_2023-04-01_976/ Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Scripts\xbrl_extract.exe__main.py", line 7, in File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\cli.py", line 156, in main return run_main(**vars(parse())) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\cli.py", line 134, in run_main extracted = xbrl.extract( ^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\xbrl.py", line 58, in extract table_defs = get_fact_tables( ^^^^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\xbrl.py", line 254, in get_fact_tables taxonomy = Taxonomy.from_source(f, entry_point=taxonomy_entry_point) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\taxonomy.py", line 251, in from_source taxonomy, view = load_taxonomy_from_archive(taxonomy_source, entry_point) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\ferc_xbrl_extractor\arelle_interface.py", line 57, in load_taxonomy_from_archive file_source = FileSource.openFileSource( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\arelle\FileSource.py", line 44, in openFileSource filesource.openZipStream(sourceZipStream) File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\site-packages\arelle\FileSource.py", line 351, in openZipStream self.fs = zipfile.ZipFile(sourceZipStream, mode="r") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\zipfile__init.py", line 1349, in init__ self._RealGetContents() File "C:\Users\PEG Intern\AppData\Local\Programs\Python\Python312\Lib\zipfile\init__.py", line 1416, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

zaneselvans commented 3 months ago

Is your error reproducible? We've gotten this error very sporadically both in this data source and some others, and it seems to be something random -- like it works 99.9% of the time.

If it is reporducible:

Also, are you just trying to access the FERC Form 1 data more generally? We publish complete extracted versions of the FERC Forms 1, 2, 6, 60, and 714. See the Nightly Builds section of the PUDL Data Access docs. The 2023 FERC Form 1 data is included as of 2 weeks ago.

If you'd like to take the data for a spin without needing to set anything up, you can also go play with our example notebooks on Kaggle. The data there is updated once a week, and will also have the 2023 FERC data.

Also, depending on what data you are trying to access in the Form 1, you may want to look at the tables which we've cleaned up and integrated into our main PUDL Database. It's only a few dozen out of the many that are available in the XBRL derived SQLite database, but they're way easier to work with, and are also integrated with the older DBF data going back to 1994.

broscious4peg commented 3 months ago

Thanks for responding to this, I am looking for access to the FERC Form 1 data from the previous years of 2020 - 2023. Where could I find the database for all of this?

zaneselvans commented 3 months ago

Download links can be found in the nightly builds section of the Data Access documentation.

I would recommend first looking at the FERC Form 1 tables which have been integrated into our main PUDL database, since it covers all years of data (1994-2023) and is much cleaner and more usable than the original DBF and XBRL data. However, there are only a couple dozen tables in there, so what you need may not be in there. Any table whose name contains ferc1 will be derived from the FERC Form 1.

If the table(s) you need have not been fully integrated into PUDL, then you will need to access the SQLite DBs that we produce which are just conversions of the old DBF and newer XBRL data formats into a modern relational database format:

You can also browse these databases online first if you want: