apache / accumulo-docker

Apache Accumulo Docker
https://accumulo.apache.org
Apache License 2.0
18 stars 25 forks source link

Update Accumulo version in Dockerfile & README.md #12

Closed karthick-rn closed 4 years ago

karthick-rn commented 4 years ago

In this PR, I have updated the Accumulo version along with Hadoop & Zookeeper to match with the latest releases. With these changes, I'm able to successfully build the image. Also, updated the README.md. Once you're happy with these changes, may be I can push the image to apache/accumulo at DockerHub. Thanks

milleruntime commented 4 years ago

@karthick-rn thanks for the updates! I want to give the updates a try on my local dev environment but I recently changed from fedora to ubuntu so I need to get docker setup first.

milleruntime commented 4 years ago

I am getting a NoClassDefFoundError related to zookeeper when trying to run the command.

java.util.ServiceConfigurationError: org.apache.accumulo.start.spi.KeywordExecutable: Provider org.apache.accumulo.server.init.Initialize could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at org.apache.accumulo.start.Main.checkDuplicates(Main.java:243)
    at org.apache.accumulo.start.Main.getExecutables(Main.java:234)
    at org.apache.accumulo.start.Main.main(Main.java:88)
Caused by: java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
    at java.lang.Class.getConstructor0(Class.java:3075)
    at java.lang.Class.newInstance(Class.java:412)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    ... 10 more
karthick-rn commented 4 years ago

@milleruntime Regarding the exception, can you add the /lib directory as shown below to the classpath & try running the same command.

CLASSPATH="${CLASSPATH}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${HADOOP_HOME}/share/hadoop/client/*:${ZOOKEEPER_HOME}/lib/*"
karthick-rn commented 4 years ago

While running the image I noticed the difference in the ZK tarball filename and corrected it & updated the PR. After this change, I tested the image and able to run ZK, Hadoop & Accumulo successfully.

milleruntime commented 4 years ago

@karthick-rn I am not sure how you are setting the classpath but here is what I am seeing:

~/workspace/accumulo-docker$ docker run accumulo classpath
/opt/accumulo/conf:/opt/accumulo/lib/*:/opt/hadoop/etc/hadoop:/opt/zookeeper/*:/opt/hadoop/share/hadoop/client/*

Then any other command throws the error:

~/workspace/accumulo-docker$ docker run accumulo --help
2020-04-08 15:53:55,004 [start.Main] ERROR: Uncaught exception
...
ctubbsii commented 4 years ago

OK I forgot the fixes for the newer version of ZK haven't been released yet. I think the dockerfile should work out of the box with the last release of Accumulo, version 2.0.0. So the ZK version should be 3.4.14 or if it works with 3.5.7 then that is fine.

It would have the same problems with 3.5.7. So, it should probably stay 3.4.14, or have some post-script to modify the class path in the Accumulo env script.

karthick-rn commented 4 years ago

It would have the same problems with 3.5.7. So, it should probably stay 3.4.14, or have some post-script to modify the class path in the Accumulo env script.

I discussed this with @milleruntime on Slack. I'm keeping the ZK version as 3.6 & adding the accumulo-env.sh as a post-script like you mentioned. I have updated the PR with these changes, please test and let me know. Thanks

milleruntime commented 4 years ago

It would have the same problems with 3.5.7. So, it should probably stay 3.4.14, or have some post-script to modify the class path in the Accumulo env script.

I discussed this with @milleruntime on Slack. I'm keeping the ZK version as 3.6 & adding the accumulo-env.sh as a post-script like you mentioned. I have updated the PR with these changes, please test and let me know. Thanks

I like having the accumulo-env.sh as a post-script but I still think we should make the dockerfile work with 2.0.0 out of the box. So this means the ZK veresion should be 3.4.14. Sorry for misleading you.

milleruntime commented 4 years ago

@karthick-rn Since this change is really to make docker work for version 2.1.0, if you want to keep this PR open for once we release 2.1.0 that is fine. We can make a separate change for now to simply update the version from alpha to 2.0.0.

karthick-rn commented 4 years ago

@karthick-rn Since this change is really to make docker work for version 2.1.0, if you want to keep this PR open for once we release 2.1.0 that is fine. We can make a separate change for now to simply update the version from alpha to 2.0.0.

@milleruntime Sorry if I wasn't clear. This change is to make docker to work with Accumulo 2.0 & not 2.1. The reason for having the accumulo-env.sh as a post-script is to address the classpath issue in Accumulo 2.0 to work with ZK versions 3.5.x & above. This classpath issue has been fixed in Accumulo 2.1 & we no longer need the post-script when we are running docker on Accumulo 2.1. Let me know if you are seeing any issues in Dockerfile running Accumulo 2.0? Also, if you still think you might need ZK 3.4.14 I'm happy to make the changes. Thanks

keith-turner commented 4 years ago

@karthick-rn I wonder if adding ENV CLASSPATH /opt/zookeeper/lib/* towards the end of the dockerfile instead of adding the accumulo-env.sh file would work. This should be picked up by Accumulo env, the only drawback is that we dont get control over the order of where /opt/zookeeper/lib/* is in the classpath.

ctubbsii commented 4 years ago

@karthick-rn I wonder if adding ENV CLASSPATH /opt/zookeeper/lib/* towards the end of the dockerfile instead of adding the accumulo-env.sh file would work. This should be picked up by Accumulo env, the only drawback is that we dont get control over the order of where /opt/zookeeper/lib/* is in the classpath.

I like @keith-turner 's suggestion here. It's better than copying the env script in.

@milleruntime I'm not following your use of "out-of-the-box" here. The way I see it, the Docker packaging is its own "box" (packaging). Accumulo 2.0.0 works fine, when properly configured, with ZK 3.5 or 3.6. So, it should be fine if the Docker packaging configures it that way. The only real difference is that you'll need to have the newer ZK running to use the Docker packaging of Accumulo, whereas if you were to package or deploy Accumulo manually, you could do so using a version as low as 3.4. But, I think it's fine for Docker to require newer.

karthick-rn commented 4 years ago

@karthick-rn I wonder if adding ENV CLASSPATH /opt/zookeeper/lib/* towards the end of the dockerfile instead of adding the accumulo-env.sh file would work. This should be picked up by Accumulo env, the only drawback is that we dont get control over the order of where /opt/zookeeper/lib/* is in the classpath.

Thanks for the suggestion @keith-turner. I'm trying this now, will update soon

milleruntime commented 4 years ago

@milleruntime I'm not following your use of "out-of-the-box" here.

A developer should be able to check out accumulo-docker, follow the README and build and run a 2.0.0 container. If they require additional steps like setting the classpath to work then we shouldn't bake the version in the dockerfile. The versions in the dockerfile should work without extra steps.

ctubbsii commented 4 years ago

@milleruntime I'm not following your use of "out-of-the-box" here.

A developer should be able to check out accumulo-docker, follow the README and build and run a 2.0.0 container. If they require additional steps like setting the classpath to work then we shouldn't bake the version in the dockerfile. The versions in the dockerfile should work without extra steps.

@milleruntime Oh, okay. So, I think you mean accumulo-docker, "out-of-the-box", should work. I previously thought you meant something like accumulo-docker should use "out-of-the-box" accumulo. If you mean the former, then I agree. But, all of the proposed changes in this PR satisfy that, so I don't see the problem.

ctubbsii commented 4 years ago

Since @milleruntime mentioned the README, it's worth noting that this change would make some versions of ZooKeeper not work at all (3.4 and earlier; mainly because ZooKeeper renamed their tarballs). However, without this change, the newer versions won't work. I'm inclined to prioritize the newer versions.

In either case, the examples in the README should avoid using versions that won't work, and should mention that some combinations of ZooKeeper and Hadoop versions may not work.

milleruntime commented 4 years ago

My only concern with this change is that we are creating more work for when we release 2.1. But if I understand the updates correctly, that won't be a problem since we would have to update the path to the new ZK binaries anyway.

ctubbsii commented 4 years ago

My only concern with this change is that we are creating more work for when we release 2.1. But if I understand the updates correctly, that won't be a problem since we would have to update the path to the new ZK binaries anyway.

Yeah, this might actually be saving us work (because of the tarball renames). If @keith-turner 's suggestion to use the ENV directive works, then the only change we'd need to do for 2.1 is to drop that superfluous ENV line.

karthick-rn commented 4 years ago

Looks like the order of the classpath is important here. When adding ENV CLASSPATH /opt/zookeeper/lib/* to Dockerfile, Accumulo positions the ENV directive first in the classpath like below.

/opt/zookeeper/lib/*:/opt/accumulo/conf:/opt/accumulo/lib/*:/opt/hadoop/etc/hadoop:/opt/zookeeper/*:/opt/hadoop/share/hadoop/client/* 

This has resulted in the below exception when trying to run queries on ashell. Exception in thread "shell" java.lang.NoSuchMethodError: org.apache.commons.cli.Options.hasShortOption(Ljava/lang/String;)Z

To maintain the order of the classpath, we tried several options & the one with sed works and doesn't require the post-script. I have updated the PR with the new changes, please review and let me know your thoughts/suggestions. Thanks

ctubbsii commented 4 years ago

We should vote on a release for this, and then publish it to Dockerhub as described in the README (I think INFRA can help with that). If anyone is interested in pursuing this, please follow up on the dev mailing list.

karthick-rn commented 4 years ago

We should vote on a release for this, and then publish it to Dockerhub as described in the README (I think INFRA can help with that). If anyone is interested in pursuing this, please follow up on the dev mailing list.

@ctubbsii I'm interested, will follow-up on the dev mailing list. Thanks