dgunning / edgartools

Navigate SEC Edgar data in Python
MIT License
411 stars 83 forks source link

How to search for many company CIKs and download all the main documents at once? #71

Closed wwchicago closed 2 weeks ago

wwchicago commented 1 month ago

I want to download all the main documents (without downloading attachments) for 65,000 company CIKs. Currently, I can only read them into Python as text and then save them as individual txt files. I tried downloading 100 files using this method, but it is very slow. I want to know if there is a function in edgartools that allows me to download all the main documents I want at once without using a for loop. 7mu1d6gn

Additionally, I want to search for all the company CIKs I want at once (just like in the form section, by entering a list), but I failed. Is there any way to search for many company CIKs at once? image

dgunning commented 1 month ago

Can you clarify your requirements. What form types do you need. All forms or 497K and 497J. For what time period. Depending on what you need it might be easier to start with the company list and filter forms, or start with forms and filter companies.

Tips to speed up company resolution.

  1. Download company data - filing and fact json files (but not html) - to your local
from edgar import *

download_edgar_data()
use_local_storage()
  1. Use the parameter include_old_filings=False in Company(cik)
  2. Use python multi processing to get to the SEC mandated limit of 10 per second. I can help with this but is really easy to as AI how to do multiprocessing

If you are approaching it from the list of filings and filtering by CIK then the next release - maybe by tomorrow will have filtering by CIK.

Regardless of what you do 65000 is a large number and will take a while. There are 86400 seconds in a day

wwchicago commented 1 month ago

Hi Dwight,

I understand what you mean. I will try your method later to speed up the process.

By the way, when I use company CIK to search a file, I can see the corresponding series CIK and contract CIK in the homepage. Is it possible to extract the information of series CIK and contract CIK using edgartools? Thank you very much!

image

wwchicago commented 1 month ago

I am sorry for being a bit verbose. Regarding edgartools, is there any documentation that summarizes all the functions and their usage? For example, I haven't seen these two functions: download_edgar_data() and use_local_storage(). I also don't know what parameters they require or how to specify which folder to download the files to.

dgunning commented 1 month ago

I will create a couple tickets to

  1. Parse and create the data objects for 497 and 497K. Shouldn't be too hard but it's a matter of priority
  2. Fill in missing documentation

Note that the API surface edgartools is really very large and the documentation needs to catch up.

Even so there are several features that are deliberately undocumented at least for a while. Local storage didn't fit in the original vision of edgartools but was later added to fill a gap, and some thinking needs to be done before making it generally available. The other reason for less documentation is that with one active developer, some features are held back from general availability since they would require support.

dgunning commented 2 weeks ago

Added documentation on Downloading to wiki. Closing