aryn-ai / sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
https://sycamore.readthedocs.io
Apache License 2.0
300 stars 31 forks source link

Add HTML table operations #549

Closed HenryL27 closed 1 month ago

HenryL27 commented 1 month ago

Parse html tables into sycamore.data.Table objects. Use that in HtmlPartitioner Add TEDs implementation for Table objects.

Next thing is to squish the TEDs into the eval framework and run PubTabNet or something through it, but that's a different PR