cloudera / clusterdock

Apache License 2.0
70 stars 57 forks source link

Data node has DFS Used%: 100.00% #27

Closed Crandel closed 7 years ago

Crandel commented 7 years ago

I have Intel Core i7 and 20 Gb Ram on my laptop. When I run

clusterdock_run ./bin/start_cluster -n hadoop cdh --include-service-type=HDFS,YARN,HIVE,HUE,OOZIE,SPARK --primary-node=node-1 --secondary-nodes=node-2

I always have this error.

INFO:clusterdock.cluster:Successfully started node-1.hadoop (IP address: 192.168.124.2).
INFO:clusterdock.cluster:Successfully started node-2.hadoop (IP address: 192.168.124.3).
INFO:clusterdock.cluster:Started cluster in 19.06 seconds.
INFO:clusterdock.topologies.cdh.actions:Changing server_host to node-1.hadoop in /etc/cloudera-scm-agent/config.ini...
INFO:clusterdock.topologies.cdh.actions:Restarting CM agents...
cloudera-scm-agent is already stopped
cloudera-scm-agent is already stopped
Starting cloudera-scm-agent: [  OK  ]
Starting cloudera-scm-agent: [  OK  ]
INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online...
INFO:clusterdock.topologies.cdh.actions:Detected Cloudera Manager server after 44.05 seconds.
INFO:clusterdock.topologies.cdh.actions:CM server is now accessible at http://cradlemanl.localdomain:32771
INFO:clusterdock.topologies.cdh.cm:Detected CM API v13.
INFO:clusterdock.topologies.cdh.cm_utils:Updating database configurations...
INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...
INFO:clusterdock.topologies.cdh.actions:Removing service ks_indexer from Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.actions:Removing service impala from Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.actions:Removing service hbase from Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.actions:Removing service solr from Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.actions:Removing service spark_on_yarn from Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.actions:Removing service zookeeper from Cluster 1 (clusterdock)...
INFO:clusterdock.topologies.cdh.actions:Once its service starts, Hue server will be accessible at http://cradlemanl.localdomain:32770
INFO:clusterdock.topologies.cdh.actions:Deploying client configuration...
INFO:clusterdock.topologies.cdh.actions:Starting cluster...
INFO:clusterdock.topologies.cdh.actions:Starting Cloudera Management service...
INFO:clusterdock.topologies.cdh.cm:Beginning service health validation...
Traceback (most recent call last):
  File "./bin/start_cluster", line 70, in <module>
    main()
  File "./bin/start_cluster", line 63, in main
    actions.start(args)
  File "/root/clusterdock/clusterdock/topologies/cdh/actions.py", line 159, in start
    deployment.validate_services_started()
  File "/root/clusterdock/clusterdock/topologies/cdh/cm.py", line 91, in validate_services_started
    "(at fault: {1}).").format(timeout_min, at_fault_services))
Exception: Timed out after waiting 10 minutes for services to start (at fault: [[u'hdfs', "Failed health checks: [u'HDFS_CANARY_HEALTH', u'HDFS_DATA_NODES_HEALTHY', u'HDFS_FREE_SPACE_REMAINING', u'HDFS_HA_NAMENODE_HEALTH']"], [u'yarn', "Failed health checks: [u'YARN_JOBHISTORY_HEALTH', u'YARN_NODE_MANAGERS_HEALTHY', u'YARN_RESOURCEMANAGERS_HEALTH']"], [u'hive', "Failed health checks: [u'HIVE_HIVEMETASTORES_HEALTHY', u'HIVE_HIVESERVER2S_HEALTHY']"], [u'oozie', "Failed health checks: [u'OOZIE_OOZIE_SERVERS_HEALTHY']"], [u'hue', "Failed health checks: [u'HUE_HUE_SERVERS_HEALTHY']"], [u'mgmt', "Failed health checks: [u'MGMT_ALERT_PUBLISHER_HEALTH', u'MGMT_EVENT_SERVER_HEALTH', u'MGMT_HOST_MONITOR_HEALTH', u'MGMT_SERVICE_MONITOR_HEALTH']"]]).

I could connect to node-1 and node-2 and see all directories inside HDFS, but when I try to copy files from local to hdfs I always have this error

copyFromLocal: File blabla._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.

This command

sudo -u hdfs hdfs dfsadmin -report

give me this result


Present Capacity: 586440704 (559.27 MB)
DFS Remaining: 0 (0 B)
DFS Used: 586440704 (559.27 MB)
DFS Used%: 100.00%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (1):

Name: 192.168.124.3:50010 (node-2.hadoop)
Hostname: node-2.hadoop
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 586440704 (559.27 MB)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Wed Apr 05 20:20:33 GMT 2017```
dimaspivak commented 7 years ago

Please paste the output of docker info run from your laptop, redacting out any potentially sensitive information.

Crandel commented 7 years ago

docker info

Containers: 3
 Running: 2
 Paused: 0
 Stopped: 1
Images: 7
Server Version: 17.03.1-ce
Storage Driver: devicemapper
 Pool Name: docker-8:17-20972999-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 11.16 GB
 Data Space Total: 107.4 GB
 Data Space Available: 96.22 GB
 Metadata Space Used: 8.757 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.139 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /media/data/linux/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /media/data/linux/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.138 (2017-03-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.10.8-1-ARCH
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 19.44 GiB
Name: cradlemanl
ID: JCHQ:CR5J:IR2P:BIIX:FTE4:J7P3:5YCK:3LXW:I7LP:EEBK:EKOL:IK7K
Docker Root Dir: /media/data/linux/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
dimaspivak commented 7 years ago

Yeah, I suspected devicemapper would be playing a role in this. I've never had luck running clusterdock with devicemapper as the storage backend driver. Try upgrading to aufs or overlayfs and you won't have any issues.

Crandel commented 7 years ago

So I changed storage driver to overlayfs2 (Arch wiki recommends), deleted all images and pull it again. I have this error

!!! Parallel execution exception under host u'192.168.123.4':
Process 192.168.123.4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 242, in inner
    submit(task.run(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/decorators.py", line 181, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/root/clusterdock/clusterdock/ssh.py", line 38, in _quiet_task
    return run(command)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 677, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 1088, in run
    shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 928, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 418, in default_channel
    chan = _open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 410, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 603, in connect
    raise NetworkError(msg, e)
NetworkError: Timed out trying to connect to 192.168.123.4 (tried 60 times)

Fatal error: One or more hosts failed while executing task '_quiet_task'

Underlying exception:
    Timed out trying to connect to 192.168.123.4 (tried 60 times)

Aborting.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/root/clusterdock/clusterdock/cluster.py", line 265, in start
    raise Exception("Timed out waiting for {0} to become reachable.".format(self.hostname))
Exception: Timed out waiting for node-1 to become reachable.

!!! Parallel execution exception under host u'192.168.123.5':
Process 192.168.123.5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 242, in inner
    submit(task.run(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/decorators.py", line 181, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/root/clusterdock/clusterdock/ssh.py", line 38, in _quiet_task
    return run(command)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 677, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 1088, in run
    shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 928, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 418, in default_channel
    chan = _open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 410, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 603, in connect
    raise NetworkError(msg, e)
NetworkError: Timed out trying to connect to 192.168.123.5 (tried 60 times)

Fatal error: One or more hosts failed while executing task '_quiet_task'

Underlying exception:
    Timed out trying to connect to 192.168.123.5 (tried 60 times)

Aborting.
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/root/clusterdock/clusterdock/cluster.py", line 265, in start
    raise Exception("Timed out waiting for {0} to become reachable.".format(self.hostname))
Exception: Timed out waiting for node-2 to become reachable.

INFO:clusterdock.cluster:Started cluster in 62.61 seconds.

I have this new lines in /etc/hosts

192.168.123.2   node-1.hadoop # Added by clusterdock
192.168.123.3   node-2.hadoop # Added by clusterdock
192.168.123.4   node-1.hadoop # Added by clusterdock
192.168.123.5   node-2.hadoop # Added by clusterdock
dimaspivak commented 7 years ago

Run clusterdock_run ./bin/housekeeping nuke to clean up the existing container clusters on your machine (as well as the /etc/hosts file) and then try again.

Crandel commented 7 years ago

I run this

clusterdock_run ./bin/housekeeping nuke
INFO:housekeeping:Removing all containers on this host...
INFO:housekeeping:Successfully removed all containers on this host.
INFO:housekeeping:Removing all user-defined networks on this host...
INFO:housekeeping:Successfully removed all user-defined networks on this host.
INFO:housekeeping:Clearing container entries from /etc/hosts...
INFO:housekeeping:Successfully cleared container entries from /etc/hosts.
INFO:housekeeping:Restarting Docker daemon...
INFO:housekeeping:Successfully nuked this host.

and have the same error

clusterdock_run ./bin/start_cluster -n hadoop cdh --include-service-type=HDFS,YARN,HIVE,HUE,OOZIE,SPARK --primary-node=node-1 --secondary-nodes=node-2
INFO:clusterdock.cluster:Network (hadoop) not present, creating it...
INFO:clusterdock.cluster:Successfully setup network (name: hadoop).
!!! Parallel execution exception under host u'192.168.123.2':
Process 192.168.123.2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 242, in inner
    submit(task.run(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/decorators.py", line 181, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/root/clusterdock/clusterdock/ssh.py", line 38, in _quiet_task
    return run(command)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 677, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 1088, in run
    shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 928, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 418, in default_channel
    chan = _open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 410, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 603, in connect
    raise NetworkError(msg, e)
NetworkError: Timed out trying to connect to 192.168.123.2 (tried 60 times)

Fatal error: One or more hosts failed while executing task '_quiet_task'

Underlying exception:
    Timed out trying to connect to 192.168.123.2 (tried 60 times)

Aborting.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/root/clusterdock/clusterdock/cluster.py", line 265, in start
    raise Exception("Timed out waiting for {0} to become reachable.".format(self.hostname))
Exception: Timed out waiting for node-1 to become reachable.

!!! Parallel execution exception under host u'192.168.123.3':
Process 192.168.123.3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 242, in inner
    submit(task.run(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/decorators.py", line 181, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/root/clusterdock/clusterdock/ssh.py", line 38, in _quiet_task
    return run(command)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 677, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 1088, in run
    shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 928, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 418, in default_channel
    chan = _open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 410, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 603, in connect
    raise NetworkError(msg, e)
NetworkError: Timed out trying to connect to 192.168.123.3 (tried 60 times)

Fatal error: One or more hosts failed while executing task '_quiet_task'

Underlying exception:
    Timed out trying to connect to 192.168.123.3 (tried 60 times)

Aborting.
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/root/clusterdock/clusterdock/cluster.py", line 265, in start
    raise Exception("Timed out waiting for {0} to become reachable.".format(self.hostname))
Exception: Timed out waiting for node-2 to become reachable.

INFO:clusterdock.cluster:Started cluster in 77.61 seconds.
!!! Parallel execution exception under host u'192.168.123.2':
Process 192.168.123.2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 242, in inner
    submit(task.run(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/decorators.py", line 181, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/root/clusterdock/clusterdock/ssh.py", line 45, in _task
    return run(command)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 677, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 1088, in run
    shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 928, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 418, in default_channel
    chan = _open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 410, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 603, in connect
    raise NetworkError(msg, e)
NetworkError: Timed out trying to connect to 192.168.123.2 (tried 60 times)
!!! Parallel execution exception under host u'192.168.123.3':
Process 192.168.123.3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 242, in inner
    submit(task.run(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/decorators.py", line 181, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 171, in __call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/tasks.py", line 174, in run
    return self.wrapped(*args, **kwargs)
  File "/root/clusterdock/clusterdock/ssh.py", line 45, in _task
    return run(command)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 677, in host_prompting_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 1088, in run
    shell_escape=shell_escape, capture_buffer_size=capture_buffer_size,
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 928, in _run_command
    channel=default_channel(), command=wrapped_command, pty=pty,
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 418, in default_channel
    chan = _open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/state.py", line 410, in _open_session
    return connections[env.host_string].get_transport().open_session()
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 159, in __getitem__
    self.connect(key)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 151, in connect
    user, host, port, cache=self, seek_gateway=seek_gateway)
  File "/usr/local/lib/python2.7/dist-packages/fabric/network.py", line 603, in connect
    raise NetworkError(msg, e)
NetworkError: Timed out trying to connect to 192.168.123.3 (tried 60 times)

Fatal error: One or more hosts failed while executing task '_task'

Underlying exception:
    Timed out trying to connect to 192.168.123.2 (tried 60 times)

Aborting.

/etc/hosts

192.168.123.2   node-1.hadoop # Added by clusterdock
192.168.123.3   node-2.hadoop # Added by clusterdock
dimaspivak commented 7 years ago

Have you tried restarting your machine? Docker networking might be misbehaving if you have trouble resolving nodes like that.

Crandel commented 7 years ago

Thank you very much!!! After reload everything works fine!!!