Open lukasgraf opened 10 years ago
It seems the Apache Tika command line interface doesn't support passing in the MIME type of the document (or any additional metadata for that matter).
Tika's Detector Interface would consider such metadata, but the metadata
argument seems to be only exposed in the Tika API, not the command line interface.
So this leaves us with one option: Set the file extension of the temporary file, and let Tika's MIME type detection do its work.
The Tika Content Detection docs say that Tika
The command line interface help describes a switch
-d or --detect Detect document type
Which seems to be enabled by default (otherwise, converting a temporary file with no extension wouldn't have worked). Still, we should probably enable this switch to be sure content type detection is always performed.
Currently a temporary file without file extension is used to store the original document passed to Tika.
We probably should
TransformEngine
'sconvertTo
method on to Tika