This also assumes that your instance is called "myinstance" and your zookeeper is available at "localhost:2181" from the container, this is NOT the case on some configurations such as Docker for Mac, but I'e left these as defaults for now until we figure out "(2) Configuration" below.
Note, I have left some commands commented out on the Dockerfile currently that make life easier to debug, these will be stripped before they get merged finally.
Currently I have a large number of TODOs and questions for people's input, each is numbered so feel free to answer some or all of them with your thoughts :)
If no one has any strong feelings about some of the questions below then I'll just press on.
(1) Repository
Given that accumulo has it's docker code in accumulo-docker do we need to do the same with this repo e.g. accumulo-proxy-docker or are we happy to merge it into the main code base?
(2) Configuration
I've been debating at length how to allow the default configuration to be overiden.
My current thought is we should allow users to volume mount in (or rebuild the container) with a new proxy.properties and we could document both approaches, this would allow full configuration to be done and versioned outside of our accumulo-proxy project.
However we probably ought to make it easy to 'get started' and therefore we could also provide a further approach to overriding:
A) Allow people to provide an environment variable to override properties e.g. PROXY_CONFIG="key:pair" and having the Java code parse this after the default proxy.properties and update the properties - This would allow any property to be updated.
B) Allow people to only override 2 properties via environment variables (instance.name, instance.zookeepers) since these are more than likely to meet 99% of use cases, we can always add additional ones in the future.
My gut feeling is a mixture of the volume mounting/rebuild guide, plus option (B) is probably the right balance for now and we can always change these later on.
(3) Software versions
Currently this change doesn't support selecting zookeeper/accumulo/hadoop versions, they're hard coded.
Do we want to allow this support?
Does this need doing on the initial commit or would it be acceptable to follow it up.
(4) Automated Build
We could use one of the maven plugins to automatically create the docker container with a new profile, e.g. instead of mvn package -Ptarball we could provide a mvn package -Pdocker or similiar.
This would be different to the standard accumulo-docker repo which details how to build using the docker build command.
I'm happy either way.
(5) Documentation in container
Looking at the accumulo-docker repo it was decided to put documentation into the container (README.md on this line)
I'm not sure it provides value but I've added both the README.md and DOCKER.md into this container currently.
Should we strip this out?
(6) CLASSPATH setting
When debugging this application I found that for some reason accumulo classpath provides a classpath which includes zookeeper, however it only includes (in my case) /opt/apache-zookeeper/ and we actually need /opt/apache-zookeeper/lib.
Look at my diff Dockerfile and look for the line ENV CLASSPATH=/opt/apache-zookeeper/lib/*
Anyone know if I'm missing something in my configuration?
(7) Install path shortcut
If you look at Dockerfile and look for the part where I add the dependencies to /opt/ (roughly line 61 starting with "# Install the dependencies into /opt/"
I take the versions of software and strip their first folder so that they are in folders similiar to /opt/hadoop/ in one step.
This may make the installed versions a bit less transparent if you were to jump into the container (e.g. it's not /opt/hadoop-3.2.1/)
Does anyone have any strong feelings about this?
--- COMMIT LOG ---
Initial commit of the work to stand up an accumulo-proxy inside a doker container.
I have only implemented support for Accumulo 2.x and by default this first commit contains:
Accumulo 2.0.0
Hadoop 3.2.1
Zookeeper 3.5.7
A new document (DOCKER.md) has been created to start to document the implementation and usage guide which should allow others to test this if they so wished.
A number of outstanding questions will be posted on the issue, there is also a number of TODOs still required to be implemented that I'm tracking.
Overview
Note this is a work in progress, still quite a bit of work to go but I thought I'd expose some of the work early.
I've tried to take the best of a few worlds and tried to keep consistency with both the accumulo-docker repo (https://github.com/apache/accumulo-docker) and the Apache guides here: (https://github.com/docker-library/official-images)
This also assumes that your instance is called "myinstance" and your zookeeper is available at "localhost:2181" from the container, this is NOT the case on some configurations such as Docker for Mac, but I'e left these as defaults for now until we figure out "(2) Configuration" below.
Note, I have left some commands commented out on the Dockerfile currently that make life easier to debug, these will be stripped before they get merged finally.
Currently I have a large number of TODOs and questions for people's input, each is numbered so feel free to answer some or all of them with your thoughts :)
If no one has any strong feelings about some of the questions below then I'll just press on.
(1) Repository
Given that accumulo has it's docker code in accumulo-docker do we need to do the same with this repo e.g. accumulo-proxy-docker or are we happy to merge it into the main code base?
(2) Configuration
I've been debating at length how to allow the default configuration to be overiden.
My current thought is we should allow users to volume mount in (or rebuild the container) with a new proxy.properties and we could document both approaches, this would allow full configuration to be done and versioned outside of our accumulo-proxy project.
However we probably ought to make it easy to 'get started' and therefore we could also provide a further approach to overriding:
A) Allow people to provide an environment variable to override properties e.g. PROXY_CONFIG="key:pair" and having the Java code parse this after the default proxy.properties and update the properties - This would allow any property to be updated.
B) Allow people to only override 2 properties via environment variables (instance.name, instance.zookeepers) since these are more than likely to meet 99% of use cases, we can always add additional ones in the future.
My gut feeling is a mixture of the volume mounting/rebuild guide, plus option (B) is probably the right balance for now and we can always change these later on.
(3) Software versions
Currently this change doesn't support selecting zookeeper/accumulo/hadoop versions, they're hard coded.
Do we want to allow this support?
Does this need doing on the initial commit or would it be acceptable to follow it up.
(4) Automated Build
We could use one of the maven plugins to automatically create the docker container with a new profile, e.g. instead of
mvn package -Ptarball
we could provide amvn package -Pdocker
or similiar.This would be different to the standard accumulo-docker repo which details how to build using the
docker build
command.I'm happy either way.
(5) Documentation in container
Looking at the accumulo-docker repo it was decided to put documentation into the container (README.md on this line)
I'm not sure it provides value but I've added both the README.md and DOCKER.md into this container currently.
Should we strip this out?
(6) CLASSPATH setting
When debugging this application I found that for some reason
accumulo classpath
provides a classpath which includes zookeeper, however it only includes (in my case)/opt/apache-zookeeper/
and we actually need/opt/apache-zookeeper/lib
.Look at my diff
Dockerfile
and look for the lineENV CLASSPATH=/opt/apache-zookeeper/lib/*
Anyone know if I'm missing something in my configuration?
(7) Install path shortcut
If you look at
Dockerfile
and look for the part where I add the dependencies to /opt/ (roughly line 61 starting with "# Install the dependencies into /opt/"I take the versions of software and strip their first folder so that they are in folders similiar to /opt/hadoop/ in one step.
This may make the installed versions a bit less transparent if you were to jump into the container (e.g. it's not /opt/hadoop-3.2.1/)
Does anyone have any strong feelings about this?
--- COMMIT LOG ---
Initial commit of the work to stand up an accumulo-proxy inside a doker container.
I have only implemented support for Accumulo 2.x and by default this first commit contains:
A new document (DOCKER.md) has been created to start to document the implementation and usage guide which should allow others to test this if they so wished.
A number of outstanding questions will be posted on the issue, there is also a number of TODOs still required to be implemented that I'm tracking.