Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.11k stars 751 forks source link

Unable to extract tables from SEC filings 10-k/10-q #958

Closed fernandobeast closed 11 months ago

fernandobeast commented 1 year ago

there is no function under from unstructured.partition.auto import partition which can extract table content, I've looped and searched elements but no luck!

can someone help me?

MthwRobinson commented 1 year ago

Hi @fernandobeast, do you have a particular filing / table you're looking at that we could test against? By the way, if you haven't seen it before, we do have an SEC filings extraction repo, though it predates the partition function. cc: @qued

fernandobeast commented 1 year ago

thanks for the reply @MthwRobinson, I am trying to extract data from 10-Q like numbers in balance sheet, cash flows, etc.. and I've followed the instruction in SEC filings extraction repo, for example looping the text from sec_document.get_table_of_contents() however it returns empty list.