Persist data to a docker volume

GELOG / adamcloud

Portable cloud infrastructure for a genomic transformation pipeline using Adam

2 stars 0 forks source link

As described in issue #13, we need to persist our container data using docker volumes.

Docker Volume Documentation

Dockerfile Reference: The VOLUME instruction
Docker User Guide: Managing data in Containers
Data that needs to be persisted

It should only affect the HDFS service of our Hadoop image. The 3 HDFS daemons that stores data that require persistence and their associated property in hdfs-site.xml:

HDFS Name Node : dfs.namenode.name.dir (default: file://${hadoop.tmp.dir}/dfs/name)
HDFS Secondary Name Node : dfs.namenode.checkpoint.dir (default: file://${hadoop.tmp.dir}/dfs/namesecondary)
HDFS Data Node : dfs.datanode.data.dir (default: file://${hadoop.tmp.dir}/dfs/data)
Other data

The other properties that uses the hadoop.tmp.dir property as a variable:

in core-site.xml:
- io.seqfile.local.dir = ${hadoop.tmp.dir}/io/local
- fs.s3.buffer.dir = ${hadoop.tmp.dir}/s3
- fs.s3a.buffer.dir = ${hadoop.tmp.dir}/s3a
in yarn-site.xml:
- yarn.resourcemanager.fs.state-store.uri = ${hadoop.tmp.dir}/yarn/system/rmstore
- yarn.nodemanager.local-dirs = ${hadoop.tmp.dir}/nm-local-dir
- yarn.nodemanager.recovery.dir = ${hadoop.tmp.dir}/yarn-nm-recovery
- yarn.timeline-service.leveldb-timeline-store.path = ${hadoop.tmp.dir}/yarn/timeline
in mapred-site.xml:
- mapreduce.cluster.local.dir = ${hadoop.tmp.dir}/mapred/local
- mapreduce.jobtracker.system.dir = ${hadoop.tmp.dir}/mapred/system
- mapreduce.jobtracker.staging.root.dir = ${hadoop.tmp.dir}/mapred/staging
- mapreduce.cluster.temp.dir = ${hadoop.tmp.dir}/mapred/staging
- mapreduce.jobhistory.recovery.store.fs.uri = ${hadoop.tmp.dir}/mapred/history/recoverystore

Most of these directories store temporary intermediate data, or are related to the map reduce framework. Since, we won't support map reduce for phase one, it is probably ok to keep the default values for these properties.

Proposal

# in hadoop/Dockerfile
VOLUME /data

# in hadoop/hdfs-site.xml
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data/dfs/data</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data/dfs/name</value>
  </property>
  <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>file:///data/dfs/namesecondary </value>
  </property>

To format the namenode (run this only once):

docker run --rm -v /hadoop-data:/data hadoop hdfs namenode -format

To run the namenode:

docker run --rm -v /hadoop-data:/data --name hdfs-namenode hadoop /usr/local/hadoop/sbin/hadoop-daemon.sh start namenode

First approach: VOLUME instruction

I've experimented a bit with the VOLUME instruction (in the Dockerfile), and here's how it works:

When the container is created, 2 read-write layers are created: 1 for the container data (as usual), and 1 for the volume.

Another container on the same host can share that container's volume by using docker run --volumes-from.

The original container can be safely destroyed, and the volume won't be deleted.

When all containers using the volume are deleted, then the volume becomes unavailable.

As stated in the docs, if one deletes the container and forgets to use docker rm -v, then the volume data is NOT deleted
If you remove containers without using the -v option, you may end up with "dangling" volumes; volumes that are no longer referenced by a container. Dangling volumes are difficult to get rid of and can take up a large amount of disk space. We're working on improving volume management and you can check progress on this in pull request https://github.com/docker/docker/pull/8484

To go around this limitation, Docker recommends using the Data Volume Container pattern, which consist of creating a container for the sole purpose of keeping a reference to the volume layer.

Approach 2: Mounting a volume from the host

An alternative to the VOLUME instruction would be not to use the VOLUME instruction, and instead mount a volume from the host when creating the container using docker run -v /path-in-the-host:/path-in-the-container.

When using this approach, docker will not create an additional layer and the data can still be shared among containers on the same host. It's probably even more efficient in terms of performance.

The only benefit we're missing from the first approach is that the VOLUME instruction is quite explicit. It helps inform users of the directories that needs persistence.

Approach 3: best of both worlds

It turns out that we can use the VOLUME instruction in the Dockerfile AND override it at container creation time using the '-v' parameter of docker run. When doing so, Docker will NOT create a layer for the volume, since it can use the mount point from the host.

And a user wishing to use the Data Volume design pattern from the 1st approach is still feasible.

I've validated my findings with docker inspect.

GELOG / adamcloud