Closed devipramita closed 1 year ago
Looks like you have the wrong installation of apache-tika, rather than the jar you only have a html page. As per the current download CDN on apache tika website there are only 2 versions available: 1.28.4 & 2.4.1. You can use the below commands to download apache tika on linux
# For 1.28.4
wget https://dlcdn.apache.org/tika/1.28.4/tika-server-1.28.4.jar
# For 2.4.1
wget https://dlcdn.apache.org/tika/2.4.1/tika-server-standard-2.4.1.jar
You can verify if you have done the correct file download by running the following command and comparing your output to the below output:
$ file tika-server-1.28.4.jar
tika-server-1.28.4.jar: Zip archive data, at least v2.0 to extract
correct @divyaksh-shukla . Closing this one out.
Hi, I get this error when parsing pdf using Tika
To overcome this issue, I've tried:
seeing this solution https://github.com/chrismattmann/tika-python/issues/238#issuecomment-527315954 I tried to run java -jar <> but it gives me another error "Invalid or corrupt jarfile tika-server.jar"
Meanwhile, the downloaded tika-server.jar contains "tika-server.jar: HTML document, ASCII text, with CRLF line terminators"
Anyone has any solution ideas to this?
Thank you