LogicalSpark / docker-tikaserver

Apache Tika Server as a Docker Image
http://logicalspark.github.io/docker-tikaserver
Apache License 2.0
170 stars 65 forks source link

Present latest tag pushed to docker hub appears not to work #4

Closed moxious closed 7 years ago

moxious commented 7 years ago
$ docker run -p 9998:9998 logicalspark/docker-tikaserver
Unable to find image 'logicalspark/docker-tikaserver:latest' locally
latest: Pulling from logicalspark/docker-tikaserver
d54efb8db41d: Already exists 
f8b845f45a87: Already exists 
e8db7bf7c39f: Already exists 
9654c40e9079: Already exists 
6d9ef359eaaa: Already exists 
82ce92ae72f5: Pull complete 
Digest: sha256:7fce25ea66c8c73c1867b19e386f4a3955ef8340d7b5f440ea14e06219cde910
Status: Downloaded newer image for logicalspark/docker-tikaserver:latest
Error: Invalid or corrupt jarfile /tika-server-1.14.jar

What's pushed on docker hub right now is not runnable.

Building it locally though does produce a workable container that doesn't have this problem.

dameikle commented 7 years ago

Not sure what went wrong but I have just triggered a rebuild on Dockerhub, and can now pull and run it. Are you able to confirm?

moxious commented 7 years ago

Just tried again, and now it works. I also have no explanation for what happened, but it looks like you fixed it. Note the sha256 difference. Thanks!

$ docker run -p 9998:9998 logicalspark/docker-tikaserver:latest
Unable to find image 'logicalspark/docker-tikaserver:latest' locally
latest: Pulling from logicalspark/docker-tikaserver
d54efb8db41d: Already exists 
f8b845f45a87: Already exists 
e8db7bf7c39f: Already exists 
9654c40e9079: Already exists 
6d9ef359eaaa: Already exists 
9dbc6e8d830d: Pull complete 
Digest: sha256:65e5a4f08d73d45ee17627c613d7ed1d2f31baa1675775e32d8d0408e0c507f5
Status: Downloaded newer image for logicalspark/docker-tikaserver:latest
Mar 11, 2017 10:20:14 PM org.apache.tika.server.TikaServerCli main
INFO: Starting Apache Tika 1.14 server
Mar 11, 2017 10:20:14 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://0.0.0.0:9998/
Mar 11, 2017 10:20:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: jetty-8.y.z-SNAPSHOT
Mar 11, 2017 10:20:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Started SelectChannelConnector@0.0.0.0:9998
Mar 11, 2017 10:20:14 PM org.apache.tika.server.TikaServerCli main
INFO: Started

curl localhost port 9998 works as expected.

dameikle commented 7 years ago

Thanks for checking. Still really puzzled. Will keep an eye on future builds.

moxious commented 7 years ago

Well, it's going to be hard to tell. The way your Dockerfile works just downloads the JAR in question from some nearest server, so I'm guessing when you built the image you probably can't be sure about where the corrupted JAR even came from. https://github.com/LogicalSpark/docker-tikaserver/blob/master/Dockerfile#L16

I also noticed in your dockerfile that you download a key and gpg import it, and then you download the signature for the tika JAR file: https://github.com/LogicalSpark/docker-tikaserver/blob/master/Dockerfile#L12 but you don't actually verify that the JAR you're getting matches the signature; that seems like a useful thing to do, since otherwise I can't tell what the purpose of the gpg stuff is.

Since I guess the file was corrupt, this probably would have been something that a gpg signature verification step would have caught. Not sure whether curl did something wrong, or the server had something wrong on it, or every crypto-paranoid's nightmare that someone had posted an illegitimate file, but in any case signature verification seems like the thing to do to triage this in the future.