dgunning / edgartools

Navigate SEC Edgar data in Python
MIT License
516 stars 101 forks source link

(Edge case) library is pulling Q4 Index before it's available #116

Closed 5fff closed 1 month ago

5fff commented 1 month ago

When using get_filings function that include current year at 1am in the beginning of a quarter will try to access the index file before it's available.

>>> filings_effect = get_filings(year=year_list, form="EFFECT")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/_filings.py", line 763, in get_filings
    filing_index = get_filings_for_quarters(year_and_quarters, index=index)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/_filings.py", line 392, in get_filings_for_quarters
    quarters_and_indexes = parallel(fetch_filing_index,
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/fastcore/parallel.py", line 134, in parallel
    return L(r)
           ^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/fastcore/foundation.py", line 100, in __call__
    return super().__call__(x, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/fastcore/foundation.py", line 108, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/fastcore/basics.py", line 70, in listify
    elif is_iter(o): res = list(o)
                           ^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/fastcore/parallel.py", line 63, in _call
    return g(item)
           ^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/_filings.py", line 333, in fetch_filing_index
    index_table = fetch_filing_index_at_url(url, index)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/_filings.py", line 348, in fetch_filing_index_at_url
    index_text = download_text(url=url)
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/httprequests.py", line 527, in download_text
    return download_file(url, as_text=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/httprequests.py", line 384, in download_file
    inspect_response(response)
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/edgar/httprequests.py", line 319, in inspect_response
    response.raise_for_status()
  File "/home/redactedusername/.asdf/installs/python/3.12.4/lib/python3.12/site-packages/httpx/_models.py", line 761, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://www.sec.gov/Archives/edgar/full-index/2024/QTR4/form.gz'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
5fff commented 1 month ago

[updated original comment] Oh actually, I think this is an very edge case, as technically it's Oct 1st at this moment on east coast, and the Q4 index is not published yet at 1 AM in the morning of 1st day of Q4. I think it's just bad timing. Perhaps add a time check?

dgunning commented 1 month ago

Thanks, this tends to happen every quarter, and it's tricky to handle and especially test since it goes away once the data shows up. I will try to put in a permanent fix

5fff commented 1 month ago

It seems like using the find() and get_by_accession_number() also run into this issue when you try to get a new filing (published on same day) by accession number.

Found a workaround which I can find an older version of the document and then use .related_filings() to use the newly published files.

Is there a quicker way to get a filing by accession number? It seems like edgar-tool is doing a lookup/search using quarterly index instead of going straight to the filing index page. e.g. https://www.sec.gov/Archives/edgar/data/{CIK}/{ACCESSION_NUMBER_NO_SPACE}/{ACCESSION_NUMBER}-index.htm

dgunning commented 1 month ago

I did a fix where it returns empty filings when it is the beginning of a quarter, and the index files have not yet been published.

This should be fixed in the latest release (2.33.0) and it should prevent the issue when the 2025 1st quarter begins.

dgunning commented 1 month ago

For the second part of your question, if you know the CIK of the filer, that would be the fastest way to find a filing. However, for a lot of companies - especially the smaller ones - the CIK of the company is not the same as a CIK of the filer. Since the CIK is not known, the Edgar tools go to the index to find the filing.

Now this adds overhead, but if you are only looking for a few filings, you can probably tolerate the overhead of getting the indexes and looking through the indexes. If you are looking for a lot of filings, the quarterly indexes are cached, which will add speed on subsequent filings.

Currently one can get a filing directly by using the filing constructor as follows:

filing = Filing(form='1-U', filing_date='2024-06-06', company='Masterworks Vault 5, LLC', cik=1999710,
                    accession_no='0001493152-24-022961')

I think we can possibly add a function that allows you to specify the CIK and accession number and find the filing.

5fff commented 1 month ago

Awesome! Constructor is exactly what I'm looker for :) And thanks for all the great work on this!

dgunning commented 1 month ago

Fixed