This PR adds detection of tika errors when a file is password protected. It simply adds an log INFO instead of WARNING with the full java traceback:
2014-03-14 11:41:28 INFO ftw.tika Could not convert password protected document.
Limitations:
The detection is done by string matching in the java exception traceback, compared with mimetype comparision. I did not find a better solution. This limits the detection to PDF and MS Office documents for now.
After implementing it I figured out that we actually cannot detect any errors when running tika in server mode, since the socket result in this case is just empty. We don't have an error stream. As I see the tika-server "protocol" (dumping the document into the socket) does not provide any kind of error detection.
Java tracebacks:
The java tracebacks differ for each kind of mimetype. Therefore I've explicitly implemented PDF and MS Office documents.
Fixes #10
This PR adds detection of tika errors when a file is password protected. It simply adds an log INFO instead of WARNING with the full java traceback:
Limitations:
Java tracebacks:
The java tracebacks differ for each kind of mimetype. Therefore I've explicitly implemented PDF and MS Office documents.
I've gisted example tracebacks for comparison: https://gist.github.com/jone/9545556
@lukasgraf do you have any inputs on this one? The last commit is the interesting one..