Closed Su-Ku-2000 closed 5 months ago
Some examples of filings where we are getting this error
2024-04-14 16:23:43,658 - root - INFO - a bytes-like object is required, not 'str' occurred for the filing --> 0000912057-00-023442 2024-04-14 16:23:43,821 - root - INFO - a bytes-like object is required, not 'str' occurred for the filing --> 0000912057-00-003201
2024-04-14 16:23:45,927 - root - INFO - a bytes-like object is required, not 'str' occurred for the filing --> 0000320193-97-000002 2024-04-14 16:23:46,135 - root - INFO - a bytes-like object is required, not 'str' occurred for the filing --> 0000320193-96-000018
2024-04-14 16:23:45,325 - root - INFO - a bytes-like object is required, not 'str' occurred for the filing --> 0000320193-98-000003
Thanks for reporting. I've looked into this and we have to add special handling for old filings.
Hey @dgunning , Thanks for checking this, so will this fix be implemented in the near future? I had one more request, can you please also expose a method that takes the html and returns the text content out of it?? So that we can get the text content of the linked exhibits as well?
Fixed in 2.18.0.
Also see from edgar.documents import html_to_text
Fixed
2024-04-14 15:25:01,042 - root - INFO - Attachment for 0001047469-02-007674.txt -> EX-99.1.txt downloaded. Traceback (most recent call last): File "/Users/test.py", line 94, in
download_filings_and_attachments(fillings10K, dir_path_10K)
File "/Users/test.py", line 57, in download_filings_and_attachments
f.write(filing.text())
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/_filings.py", line 1671, in text
return HtmlDocument.from_html(html_content).text
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/documents.py", line 422, in from_html
root: Tag = cls.get_root(html)
File "/Users/sumithkumars/Library/Python/3.9/lib/python/site-packages/edgar/documents.py", line 412, in get_root
if "" in html[:500]:
TypeError: a bytes-like object is required, not 'str'