dgunning / edgartools

Python library for working with SEC Edgar
MIT License
324 stars 70 forks source link

Parsing error for text surrounded by tags ##TABLE_START &#9679 AND ##TABLE_END #67

Closed fdejax90 closed 3 days ago

fdejax90 commented 6 days ago
from edgar import *

filings = get_filings(index="xbrl")
filings = Company('WOW').get_filings(form="10-K").latest(1).obj()

for elt in filings.chunked_document.chunks_for_item('Item 1A'):
    print(elt)
    print('-'*100)

>>> ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[156], line 3
      1 for num_elt in my_object['chunk_rf_1']:
      2     for elt in num_elt:
----> 3         print(elt)
      4         print('-'*10)

TypeError: __repr__ returned non-string (type NoneType)

That's the extract that is causing the issue above:

repr(elt[0].get_text())
>>> "'\\n● | subject us to sensitivity to increases in prevailing interest rates;\\n--+---------------------------------------------------------------------\\n\\n'"

By using the sec-api I noted the text is surrounded by the tags: ##TABLE_START &#9679 and ##TABLE_END

dgunning commented 6 days ago

Found the issue. Testing a fix for this

dgunning commented 3 days ago

Fixed in 2.26.0