Open scanny opened 1 week ago
Summary
ListItem
category_depth
To Reproduce
from unstructured.partition.html import partition_html from unstructured.staging.base import elements_to_json html_text = """ <ul> <li>foo</li> <li> <ol> <li>first</li> <li>second</li> </li> </ul> """ elements = partition_html(text=html_text) print(f"{elements_to_json(elements, indent=2)}")
Expected:
[ { "element_id": "0b460a31b167710ce27995abb2dc4cbd", "metadata": { "category_depth": 1, "filetype": "text/html", "languages": ["eng"] }, "text": "foo", "type": "ListItem" }, { "element_id": "2a3077c93b2a754629ee52b0e4e8ff11", "metadata": { "category_depth": 2, "filetype": "text/html", "languages": ["eng"], "parent_id": "0b460a31b167710ce27995abb2dc4cbd" }, "text": "first", "type": "ListItem" }, { "element_id": "1b1e0e9be12f02351e2085308766a44f", "metadata": { "category_depth": 2, "filetype": "text/html", "languages": ["eng"], "parent_id": "0b460a31b167710ce27995abb2dc4cbd" }, "text": "second", "type": "ListItem" } ]
Actual:
[ { "element_id": "0b460a31b167710ce27995abb2dc4cbd", "metadata": { "category_depth": 1, "filetype": "text/html", "languages": ["eng"] }, "text": "foo", "type": "ListItem" }, { "element_id": "9076282f2333a371a3f2889f789e6641", "metadata": { "category_depth": 1, "filetype": "text/html", "languages": ["eng"] }, "text": "first\n second", "type": "ListItem" } ]
Additional context Fixed by #3218. Recorded here to explain ingest test output changes and to inform CHANGELOG.
Summary
ListItem
elementcategory_depth
metadata is mis-reportedTo Reproduce
Expected:
Actual:
Additional context Fixed by #3218. Recorded here to explain ingest test output changes and to inform CHANGELOG.