databrickslabs / tika-ocr

Other
17 stars 2 forks source link

Add `enableXMLOutput` to `TikaExtractor.extract` #46

Closed arcaputo3 closed 1 month ago

arcaputo3 commented 2 months ago
arcaputo3 commented 2 months ago

@aamend FYI - also seems to improve LLM use cases for, for example, tabular pdfs

aamend commented 1 month ago

Nice job! Thanks for the contrib