Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
8.65k stars 704 forks source link

rfctr(html): organize and improve HTMLDocument tests #3161

Closed scanny closed 3 months ago

scanny commented 3 months ago

Summary In preparation for further work on HTMLDocument, organize the organic growth in documents/tests_html.py and improving typing and expression.

Reviewers: Commits are groomed and review is probably eased by going commit-by-commit