dgunning / edgartools

Python library for working with SEC Edgar
MIT License
324 stars 70 forks source link

Bug: Filing item text is cut off after '$' character #21

Closed davelacy closed 4 months ago

davelacy commented 6 months ago

I have a particular filing and was looking at the items and noticed it cuts off after the dollar sign. This particular example is an 8-K.

# get the company
c = Company('0001588272')

# get the filing
f = c.get_filings().filter(form='8-K')[0]

# display the item text 
print(f.text())

Notice the cut-off at:

which the Company issues shares under its distribution reinvestment plan (the “DRP”) at $

dgunning commented 6 months ago

Will take a look

dgunning commented 6 months ago

Sorry for the delay.

edgartools uses the unstructured library for parsing html and unstructured currently has an issue handling nested div tags. I'll take a look at how to fix this

davelacy commented 6 months ago

No worries at all... not the easiest to parse these things. Thanks for taking a look

dgunning commented 4 months ago

Sorry for the long wait. This is fixed in 2.10.1