dgunning / edgartools

Python library for working with SEC Edgar
MIT License
324 stars 70 forks source link

Using ProcessPoolExecutor with get_filings #42

Closed AnthonyTremblayy closed 2 months ago

AnthonyTremblayy commented 2 months ago

I am trying to run Company(ticker).get_filings(form=["10-K","10-Q"]).filter(date=f'{start_date}:{end_date}') for around 1000 stocks using ProcessPoolExecutor. I seem to be getting error 429 even by setting the max_workers = 8 (SEC says it allows for 10 requests every second). I was wondering if 1. form=["10-K","10-Q"] = 1 or 2 requests and if 2. there was a better way to send the requests i.e accessing a list of company fillings AFTER importing all of the SEC filings at once (one request). Any help would be appreciated!

dgunning commented 2 months ago

Company(ticker) gets the company json with the first 1000 filings for that company. If the company has more than 1000 filings the code issues a call for each page of 1000 filings until all are retrieved. So for given company you could be making several calls Company(ticker, include_old_filings=False) makes just the first call, but that's moot since you need all the 10-K's

max_workers>2 will likely overload the requests per second at the moment.

There is a more efficient way to do this .. I need some time to do the details but

I'm also working on a set of batch improvements including a Throttler inside the request framework but that's a couple releases out (a week or two)

(Side note: you've given me an idea for optimization by getting old filings only when get_filings is called. Right now it's just a filter.)