Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
7.8k
stars
626
forks
source link
fix: set `resolve_entities=False` in `partition_xml` #3088
Closed
MthwRobinson closed 2 months ago
Summary
Closes #3078. Sets
resolve_entities=False
for parsing XML withlxml
inpartition_xml
to avoid text being dynamically injected into the document.Testing
pytest test_unstructured/partition/test_xml.py
continues to pass with the update.