Summary
Remove HTML-specific element types and return "regular" elements like Title and NarrativeText from partition_html().
Additional Context
An aspect of the legacy HTML partitioner was the use of HTML-specific element types used to track metadata during partitioning.
That role is no longer necessary or desireable.
HTML-specific elements like HTMLTitle and HTMLNarrativeText were returned from partitioning HTML but also the seven other file-formats that broker partitioning to HTML (convert-to-HTML and partition_html()). This does not cause immediate breakage because these are still Text element subtypes, but it produces a confusing developer experience.
Remove the prior metadata roles from HTML-specific elements and remove those element types entirely.
Summary Remove HTML-specific element types and return "regular" elements like
Title
andNarrativeText
frompartition_html()
.Additional Context
HTMLTitle
andHTMLNarrativeText
were returned from partitioning HTML but also the seven other file-formats that broker partitioning to HTML (convert-to-HTML and partition_html()). This does not cause immediate breakage because these are stillText
element subtypes, but it produces a confusing developer experience.