dgunning / edgartools

Python library for working with SEC Edgar
MIT License
324 stars 70 forks source link

strip_ixbrl_tags fails if 'style' is None #29

Closed hspak closed 5 months ago

hspak commented 5 months ago

Hello I'm back 😅

I seem to have found another edge case for strip_ixbrl_tags.

edgartools version: 2.8.1

import edgar

edgar.set_identity("id")
company = edgar.Company("ABNB")  # Ticker is important, not all filings run into this issue
filings = company.get_filings(form=["10-K"])
if filings:
    print(filings[0].text()) 

Stacktrace

Traceback (most recent call last):
  File "/workspace/test.py", line 10, in <module>
    print(filings[0].text())
          ^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/_filings.py", line 1512, in text
    return html_to_text(html_content, ignore_tables=ignore_tables, sep=sep)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 87, in html_to_text
    html_str = try_to_strip_ixbrl_tags(html_str)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 176, in try_to_strip_ixbrl_tags
    return strip_ixbrl_tags(html_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/.venv/lib/python3.11/site-packages/edgar/htmltools.py", line 191, in strip_ixbrl_tags
    if parent.tag == '{http://www.w3.org/1999/xhtml}div' and 'display:inline' in parent.get('style'):
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable 
dgunning commented 5 months ago

Thanks for reporting, I'm taking a look.

dgunning commented 5 months ago

Patched in 2.8.2.

Planning to do a more permanent fix in the coming weeks.