incompatible clusterID Hadoop

Atahualkpa commented 6 years ago

Hi, anytime I rebooted the swarm I have this problem

java.io.IOException: Incompatible clusterIDs in /hadoop/dfs/data: namenode clusterID = CID-b25a0845-5c64-4603-a2cb-d7878c265f44; datanode clusterID = CID-f90183ca-4d87-4b49-8fb2-ca642d46016c at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:777)

FATAL datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to namenode/10.0.0.7:8020. Exiting. java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:574)

I solved this problem deleting this docker volume

sudo docker volume inspect hadoop_datanode

[ { "CreatedAt": "2018-05-10T19:35:31Z", "Driver": "local", "Labels": { "com.docker.stack.namespace": "hadoop" }, "Mountpoint": "/data0/docker_var/volumes/hadoop_datanode/_data", "Name": "hadoop_datanode", "Options": {}, "Scope": "local" } ] but in this volume are present the files which I put in hdfs, so in this way I have to to put again the files into hdfs when I deploy the swarm. I'm not sure this is the right way to solve this problem. Googling I found one solution but I dont know how to applicate it before the swarm reboot, this is the solution: The problem is with the property name dfs.datanode.data.dir, it is misspelt as dfs.dataode.data.dir. This invalidates the property from being recognised and as a result, the default location of ${hadoop.tmp.dir}/hadoop-${USER}/dfs/data is used as data directory. hadoop.tmp.dir is /tmp by default, on every reboot the contents of this directory will be deleted and forces datanode to recreate the folder on startup. And thus Incompatible clusterIDs. Edit this property name in hdfs-site.xml before formatting the namenode and starting the services.

thanks.

earthquakesan commented 6 years ago

@Atahualkpa Hi!

Which docker-compose are you using? Or what is your setup? Do you persist the data to the local drive from your docker containers? Eg by having volumes key.

services:
  namenode:
    volumes:
      - /path/to/the/folder:/hadoop/dfs/name
  datanode:
    volumes:
      - /path/to/the/folder:/hadoop/dfs/data

Atahualkpa commented 6 years ago

Hi @earthquakesan thanks for your answer, I have this setup for docker compose:

version: '3'
services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
    networks:
      - workbench
    volumes:
      - namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure
      labels:
        traefik.docker.network: workbench
        traefik.port: 50070
    ports:
      - 8334:50070
    volumes:
      - /data0/reference/hg19-ucsc/:/reference/hg19-ucsc/
      - /data0/output/:/output/
      - /data/ngs/:/ngs/
  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
    networks:
      - workbench
    volumes:
      - datanode:/hadoop/dfs/data
    environment:
      SERVICE_PRECONDITION: "namenode:50070"
    env_file:
      - ./hadoop.env
    deploy:
      mode: global
      restart_policy:
        condition: on-failure
    labels:
      traefik.docker.network: workbench
      traefik.port: 50075

volumes:
  datanode:
  namenode:

networks:
  workbench:
    external: true

but I observe I have not set a path for hdfs. I try to set a local path but the problem is still present. i checked the path and I find this file called VERSION into a directory named current. This is written into file:

storageID=DS-6e863e5f-34a1-4d09-bcf2-58f6badc7dba clusterID=CID-4a2c4782-785b-4b8c-be8f-e0d7cef85b24 cTime=0 datanodeUuid=48dc924c-fea1-40d8-9da2-7faeb2ee28b9 storageType=DATA_NODE layoutVersion=-56

also checking the directory i fount this folder BP-1651631011-10.0.0.12-1527073017748/current and into this folder is present another file called VERSION but in this is written this:

namespaceID=1025220048 cTime=0 blockpoolID=BP-1651631011-10.0.0.12-1527073017748 layoutVersion=-56

this is the exception generated

namenode clusterID = CID-37f14517-46c8-430a-803d-5fe2b0d047fc; datanode clusterID = CID-4a2c4782-785b-4b8c-be8f-e0d7cef85b24
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:777)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:300)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:416)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:395)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:573)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1386)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1351)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:313)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:637)
    at java.lang.Thread.run(Thread.java:748)

Thanks for your support.

earthquakesan commented 6 years ago

@Atahualkpa How many nodes do you have in your swarm cluster? Do the containers always allocated on the same nodes?

Atahualkpa commented 6 years ago

Now I have three nodes into the swarm into the leader are running 6 containers their are:

bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
atahualpa/spark-master:4.1.2(this container contains bde2020/spark-master:2.2.0-hadoop2.8-hive-java8 where I installed GATK 4)
bde2020/spark-worker:2.2.0-hadoop2.8-hive-java8
traefik:v1.1.0

and into the others are running

bde2020/spark-worker:2.2.0-hadoop2.8-hive-java
bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
vzzarr/reference:hg19_img

Do the containers always allocated on the same nodes? Yes, but I must first start the leader because in this are presents files I puting into HDFS, if I join the another nodes the master spark and the name node is selected random.

moreover anytime I deploy the swarm was are present this hadoop_volume into any node the swarm.

thanks.

big-data-europe / docker-hadoop-spark-workbench

incompatible clusterID Hadoop #55