The docker images in this repository are expected to be given names of the form teradatalabs/cdh5-hive. The Dockerfile and other files needed to build the teradatalabs/cdh5-hive image are located in the directory teradatalabs/cdh5-hive.
Generally speaking, the images should not be built manually with docker build.
The docker images should be built using make
. To build the docker image named
teradatalabs/cdh5-hive
, run make teradatalabs/cdh5-hive
. Make will build
the image and its dependencies in the correct order.
If you are going to release an image, you should release it and all of its
dependencies. Master and slave images should be built from the same chain of
parent images. You can ensure that both are built from the same set of parent
images by running e.g. make teradatalabs/cdh5-hive-master terdatalabs/cdh5-hive-slave
If you want to build a base image and all the images depending on it,
you can use the *.dependants
targets. E.g.
make teradatalabs/cdh5-base.dependants
will build the cdh5-base
and all the images depending on it (transitively).
All of the docker images in the repository share the same version number. This is because most of the images depend on a parent image that is also in the repository (e.g. teradatalabs/hdp2.5-master is FROM teradatalabs/hdp2.5-base), or are meant to be used together in testing (teradatalabs/cdh5-hive-master and teradatalabs/cdh5-hive-slave).
Having all of the images on the same version number make troubleshooting easy: Iff all of the docker images you are using have the same version number then they are in a consistent state.
This means that we treat the repository as a single codebase that creates multiple artifacts (Docker images) that all need to be released together. The Makefile uses docker-release to automate this process and ensure that the images on dockerhub are in a consistent state provided all of the push operations run to completion.
docker-release also handles tagging the images and repository appropriately so that you can easily find the Dockerfile used to create an image starting from just the tags on a Docker image.
Best practice for publishing a snapshot or release version is to use the Jenkins job. Login to Jenkins and search for docker-images
. If you must publish a new version manually, follow these steps:
To release a snapshot version of the repository do the following
docker login
VERSION
is set to something ending in -SNAPSHOT.make snapshot
To release a release (final) version of the repository do the following
docker login
VERSION
is set to something not ending in -SNAPSHOT.make release
To release a snapshot or final version, you must log in to docker using the
docker login
command.
Normally developers are working on a snapshot version of the next release, and
the VERSION
macro in the Makefile should be set to a snapshot version such as
35-SNAPSHOT. A typical workflow is as follows:
make snapshot
to push snapshot releases to dockerhub as neededEventually, version 35-SNAPSHOT is ready for release. To release version 35, do the following:
VERSION
to the release version: 35-SNAPSHOT -> 35make release
to push the images to dockerhub and tag the repositoryVERSION
to the next snapshot version: 35 -> 36-SNAPSHOTmake snapshot
does the following:
make release
does the following:
docker-release
enforces several rules about the state of the repository when pushing to dockerhub:
For a project that uses Travis for continuous integration, you can upgrade the docker images used by the project using the following process.
make snapshot
to release a snapshot build to dockerhub.make release
Docker build arguments are documented in the Dockerfile reference
Args are used by specifying the ARG directive in a Dockerfile:
ARG FOO
RUN echo $FOO >/etc/foo
The value of FOO then needs to be set in the Makefile:
FOO := Docker images build on $(shell uname -s) are superior to all others.
Note that docker build
does not allow the variable reference $FOO
to be
written ${FOO}
or $(FOO)
. Further note that it won't warn you about this;
instead, you'll likely end up with an error later in the build or a broken
image.
docker build
won't let you pass --build-arg
s that don't have a
corresponding key in the Dockerfile. This means that the build system can't
just pass the union of all of the --build-arg
s needed by every Dockerfile in
the repository. The build system handles this largely the same way it handles
figuring out what the correct dependency order is for building the images,
described below.
Build args with a default value are not handled at present. Feel free to add
that functionality in flag.sh
if needed.
Individual Dockerfiles shouldn't contain the URL for downloading Java, the name of the RPM, or the path that java gets installed in. Doing this makes upgrading Java across the repo a pain with a bunch of touch points.
Instead, the build system exposes the Docker build
arguments JDK_URL
,
JDK_RPM
, and JDK_PATH
. These can be used in your Dockerfile as follows:
ARG JDK_URL
RUN wget $JDK_URL
At a high level, a docker image depends on two things:
Using the relative directory from the root of the repo as the image name, we could, in principle, write a rule of the form
teradatalabs/foo: teradatalabs/foo/Dockerfile $(extract_parent teradatalabs/foo/Dockerfile)
cd teradatalabs/foo && docker build -t teradatalabs/foo .
Using automatic variables we could shorten that to the following:
teradatalabs/foo: $@/Dockerfile $(extract_parent $@/Dockerfile)
cd $@ && docker build -t $@ .
This is conceptually valid, but it doesn't work: Automatic variables aren't available in the prerequisites. The solution to solve that is to use a pattern rule:
$(images): %: %/Dockerfile $(extract_parent %/Dockerfile)
...
That almost works. Almost because you can't use the stem (%) in a function call.
Instead, we can use three features of make together to accomplish the same thing.
teradatalabs/foo: teradatalabs/foo_parent
teradatalabs/foo: teradatalabs/foo/Dockerfile
...
The strategy is to include a separate file that specifies the dependency on the parent image. This file isn't in the repo, so the Makefile has a rule to make it from the image's Dockerfile. The second rule specifies the dependency on the Dockerfile and builds the image using docker build.
Recursive Make Considered Harmful explains this technique in section 5.4 and applies it to C source files and the .h files they include. I've adapted it here.
The depend.sh script generates a .d file in $(DEPDIR) from the Dockerfile for the image:
$(DEPDIR)/teradatalabs/foo.d: teradatalabs/foo/Dockerfile
...
The corresponding .d file will take one of two forms:
if foo's parent is built from this repository
teradatalabs/foo: teradatalabs/foo_parent
if foo's parent should be pulled from dockerhub
teradatalabs/foo:
In the first case, make now knows that foo_parent is a dependency of foo, and builds it first.
In the second case, we don't add a dependency for make, and docker itself is responsible for pulling foo's parent from dockerhub as part of the docker build process.
A major difference between the approach explained in Recursive Make Considered Harmful is that depend.sh needs to know what images the repo knows how to build so it can output the second form for parent images we don't know how to build. We do this by passing in the names of all of the images we know how to build.