Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.49k stars 584 forks source link

rfctr(html): promote HTMLDoc candidate methods #3177

Closed scanny closed 3 weeks ago

scanny commented 3 weeks ago

Summary Make ._find_articles() and ._find_main into ._articles and ._main properties on HTMLDocument, respectively.

Additional Context After prior refactorings, these two functions now each require only self and can become @lazypropertys on HTMLDocument. This ensures they are computed at most once. In addition, their close relationship to HTMLDocument is indicated by their membership as methods rather than "loose" functions.