catalyst-cooperative / ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases
MIT License
11 stars 0 forks source link

Add retries to taxonomy reading. #205

Closed jdangerx closed 3 months ago

jdangerx commented 3 months ago

Overview

Addresses catalyst-cooperative/pudl#3449

What problem does this address?

When we try to load a taxonomy multiple times concurrently, we run into FileExistsErrors as the various threads try to cache files at the same location.

What did you change in this PR?

I made us retry the taxonomy read if the cache is confused.

I also added a surprisingly hard-to-write test that runs these taxonomy reads in parallel.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

- [ ] after merge: need to push a `v1.3.3` tag
- [ ] depend on new version in `pudl`
zaneselvans commented 3 months ago

I know you didn't ask for my review but it's quick and I'd love for this to get fully in before the 1st of the month so we can get the new FERC archives nailed down.

jdangerx commented 3 months ago

Thanks for the review! I had just tagged @catalyst-cooperative/inframundo for review & the round-robin picked @zschira - I had no strong preference for review here :)

Fortunately, the FERC monthly archives should already be fixed by this PR. But this is part of getting the PUDL nightly builds to be more reliable, so still worth getting it in sooner rather than later.