Closed matthiasg closed 10 years ago
Thanks for this :) Just one question in the line comments.
Ok, pulled in commit 8b00b4c4e2c59c156b8a3f35d166221771c09a33. I've added your name to the contributors file.
Thanks.. Even better that you updated to next Tika .. Wanted to do that myself .. But got stuck on node-java not working on SmartOS ..
Yeah, Tika 1.6 fixes a lot of issues I had with parsing PDFs :smile:
BOOM this is awesome!
When parsing some PDF files the text output was not in UTF8 (German Umlaute where wrong for example). I added explicit default UTF8 encoding for the
OutputStreamWriter
used and i get UTF8 output now.