Closed arcaputo3 closed 7 months ago
TikaFileFormat
spark
IOUtils
ByteStreams
binaryFiles
scala
tika-parsers-standard-package
maven-shade-plugin
ServicesResourceTransformer
tika
tika.parser.ocr.timeout
All committers have signed the CLA.
TikaFileFormat
syntax based onspark
3.5.0 changesIOUtils
toByteStreams
(based onspark
3.5.0binaryFiles
implementation)scala
to version 2.12.15tika-parsers-standard-package
to 2.9.1maven-shade-plugin
to 3.5.1ServicesResourceTransformer
to help Databricks discovertika
parserstika.parser.ocr.timeout
option to allow for longer Tesseract timeouts