Closed lc2a closed 7 years ago
How many CPUs/how much RAM does the machine you're using have? What OS is it running?
[root@vm-02 ~]# free -g total used free shared buff/cache available Mem: 47 1 43 0 1 45 Swap: 7 0 7
[root@vm-02 ~]# rpm --query centos-release centos-release-7-3.1611.el7.centos.x86_64
[root@vm-02 ~]# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 1 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 8 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 2 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 8 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 3 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 8 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 4 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 4 cpu cores : 8 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 5 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 5 cpu cores : 8 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 6 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 6 cpu cores : 8 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
processor : 7 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6386 SE stepping : 0 microcode : 0x6000832 cpu MHz : 2792.034 cache size : 2048 KB physical id : 0 siblings : 8 core id : 7 cpu cores : 8 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm constant_tsc art rep_good nopl tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx hypervisor lahf_lm extapic abm sse4a misalignsse 3dnowprefetch osvw xop fma4 arat bogomips : 5586.00 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
[root@vm-02 ~]# docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-11.el7.centos.x86_64 Go version: go1.7.4 Git commit: 96d83a5/1.12.6 Built: Tue Mar 7 09:23:34 2017 OS/Arch: linux/amd64
Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-11.el7.centos.x86_64 Go version: go1.7.4 Git commit: 96d83a5/1.12.6 Built: Tue Mar 7 09:23:34 2017 OS/Arch: linux/amd64
[root@hdp-test-02 ~]# docker info
Containers: 15
Running: 0
Paused: 0
Stopped: 15
Images: 4
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-253:0-137522201-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 10.26 GB
Data Space Total: 107.4 GB
Data Space Available: 14.95 GB
Metadata Space Used: 9.638 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.138 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use --storage-opt dm.thinpooldev
to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp
Kernel Version: 3.10.0-514.6.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 8
Total Memory: 47.01 GiB
Name: hdp-test-02.hpls.local
ID: RJLD:4XQQ:JUMN:KPWO:5D4X:MM3K:CDVF:CN7G:EDHD:UWMO:RUYB:PFFV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Registries: docker.io (secure)
Thanks :)
I'm pretty confident this is caused by running devicemapper
as the storage backend driver. Switch over to aufs
or overlayfs
and you should never see this issue.
reinstall docker to docker-ce
[root@vm-02 ~]# docker info Containers: 7 Running: 4 Paused: 0 Stopped: 3 Images: 4 Server Version: 17.03.1-ce Storage Driver: overlay Backing Filesystem: xfs Supports d_type: false Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 3.10.0-514.6.2.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 47.01 GiB Name: hdp-test-02.hpls.local ID: RJLD:4XQQ:JUMN:KPWO:5D4X:MM3K:CDVF:CN7G:EDHD:UWMO:RUYB:PFFV Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
[root@vm-02 ~]# clusterdock_run ./bin/start_cluster -n testing cdh --primary-node=node-1 --secondary-nodes='node-{2..4}' --include-service-types=HDFS,HIVE,HUE,ZOOKEEPER,HBASE,YARN,SPARK_ON_YARN,SQOOP2
-bash: B: command not found
INFO:clusterdock.cluster:Network (testing) not present, creating it...
INFO:clusterdock.cluster:Successfully setup network (name: testing).
INFO:clusterdock.cluster:Successfully started node-2.testing (IP address: 192.168.123.3).
INFO:clusterdock.cluster:Successfully started node-3.testing (IP address: 192.168.123.4).
INFO:clusterdock.cluster:Successfully started node-4.testing (IP address: 192.168.123.5).
INFO:clusterdock.cluster:Successfully started node-1.testing (IP address: 192.168.123.2).
INFO:clusterdock.cluster:Started cluster in 27.74 seconds.
INFO:clusterdock.topologies.cdh.actions:Changing server_host to node-1.testing in /etc/cloudera-scm-agent/config.ini...
INFO:clusterdock.topologies.cdh.actions:Removing files (/var/lib/cloudera-scm-agent/uuid, /dfs/dn/current/) from hosts (node-3.testing, node-4.testing)...
rm: cannot remove /dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir0': Directory not empty rm: cannot remove
/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir0': Directory not empty
rm: cannot remove /dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir1': Directory not empty rm: cannot remove
/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir1': Directory not empty
rm: cannot remove /dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir2': Directory not empty rm: cannot remove
/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir2': Directory not empty
Fatal error: run() received nonzero return code 1 while executing!
Requested: rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/ Executed: /bin/bash -l -c "rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/"
Aborting.
Fatal error: run() received nonzero return code 1 while executing!
Requested: rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/ Executed: /bin/bash -l -c "rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/"
Aborting.
Fatal error: One or more hosts failed while executing task '_task'
Aborting.
Did you completely remove the /var/lib/docker folder after reinstalling? Still looks like something caused by holdover from devicemapper.
-- -Dima
Retried.
1) It stop as below [root@vm-02 ~]# clusterdock_run ./bin/start_cluster -n cluster cdh --primary-node=node-1 --secondary-nodes=node-{2..3} --include-service-types=HDFS,YARN --dont-start-cluster -bash: B: command not found INFO:clusterdock.topologies.cdh.actions:Pulling image docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node. This might take a little while... cdh580_cm581_secondary-node: Pulling from cloudera/clusterdock 3eaa9b70c44a: Already exists 99ba8e23f310: Already exists c9c08e9a0d03: Already exists 7434a9a99daa: Already exists d52d9baa0ee6: Already exists f70deff0592f: Pull complete Digest: sha256:251778378b362adff4e93b99d423848216e4823965dabd1bd4c41dbb4c79afcf Status: Image is up to date for cloudera/clusterdock:cdh580_cm581_secondary-node INFO:clusterdock.cluster:Network (cluster) not present, creating it... INFO:clusterdock.cluster:Successfully setup network (name: cluster). INFO:clusterdock.cluster:Successfully started node-3.cluster (IP address: 192.168.124.2). INFO:clusterdock.cluster:Successfully started node-1.cluster (IP address: 192.168.123.2). INFO:clusterdock.cluster:Started cluster in 26.85 seconds. INFO:clusterdock.topologies.cdh.actions:Changing server_host to node-1.cluster in /etc/cloudera-scm-agent/config.ini... INFO:clusterdock.topologies.cdh.actions:Restarting CM agents... cloudera-scm-agent is already stopped Starting cloudera-scm-agent: [ OK ] Stopping cloudera-scm-agent: [ OK ] Starting cloudera-scm-agent: [ OK ] INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online... INFO:clusterdock.topologies.cdh.actions:Detected Cloudera Manager server after 108.39 seconds. INFO:clusterdock.topologies.cdh.actions:CM server is now accessible at http://test.local:32769 INFO:clusterdock.topologies.cdh.cm:Detected CM API v13. INFO:clusterdock.topologies.cdh.cm_utils:Updating database configurations... INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...
[root@vm-02 ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 026bab52e24e docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node "/sbin/init" 3 minutes ago Up 3 minutes cocky_clarke 46bf9a62c5b3 docker.io/cloudera/clusterdock:cdh580_cm581_primary-node "/sbin/init" 3 minutes ago Up 3 minutes 0.0.0.0:32775->7180/tcp, 0.0.0.0:32774->8888/tcp romantic_hamilton 25367e5e6e93 docker.io/cloudera/clusterdock:latest "python ./bin/star..." 3 minutes ago Up 3 minutes vigilant_minsky 81b596f7fe27 docker.io/cloudera/clusterdock:latest "python ./bin/hous..." 10 minutes ago Exited (0) 10 minutes ago gracious_hamilton
Login to CM 2) Why ALL services installed? But I have specified --include-service-types=HDFS,YARN,ZOOKEEPER 3) At Host Page, why only 2 nodes? But I have specified --primary-node=node-1 --secondary-nodes=node-{2..4} node-1.cluster 192.168.124.2 22 Role(s) HBase Master HDFS Balancer HDFS NameNode HDFS SecondaryNameNode Hive Gateway Hive Metastore Server HiveServer2 Hue Server Impala Catalog Server Impala StateStore Key-Value Store Indexer Lily HBase Indexer Cloudera Management Service Alert Publisher Cloudera Management Service Event Server Cloudera Management Service Host Monitor Cloudera Management Service Service Monitor Oozie Server Solr Server Spark Gateway Spark History Server YARN (MR2 Included) JobHistory Server YARN (MR2 Included) ResourceManager ZooKeeper Server
node-2.cluster 192.168.124.3 6 Role(s) HBase RegionServer HDFS DataNode Hive Gateway Impala Daemon Spark Gateway YARN (MR2 Included) NodeManager
If it didn't timeout, sounds like it's still running. Removing services is only done once CM setup is complete, thus why you still saw them there.
Ah, part of your problem is that you need to quote arguments if you're passing in --secondary-nodes
that use Bash expansion. That is, --secondary-nodes='node-{2..4}'
, not --secondary-nodes=node-{2..4}
. That would explain why you're only seeing two nodes, though you're specifying three (not four).
oops .. corrected --secondary-nodes='node-{2..4}' but still got the same errors.
rm: cannot remove /dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir0': Directory not empty rm: cannot remove
/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir0': Directory not empty
rm: cannot remove /dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir1': Directory not empty rm: cannot remove
/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir1': Directory not empty
rm: cannot remove `/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir2': Directory not empty
Fatal error: run() received nonzero return code 1 while executing!
Requested: rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/ Executed: /bin/bash -l -c "rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/"
Aborting. rm: cannot remove `/dfs/dn/current/BP-637181590-192.168.124.2-1469835153284/current/finalized/subdir0/subdir2': Directory not empty
Fatal error: run() received nonzero return code 1 while executing!
Requested: rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/ Executed: /bin/bash -l -c "rm -rf /var/lib/cloudera-scm-agent/uuid /dfs/dn/current/"
Aborting.
Fatal error: One or more hosts failed while executing task '_task'
Aborting.
Retried with 2 nodes, but stuck at Hive logging ..
[root@vm-02 ~]# clusterdock_run ./bin/housekeeping nuke -bash: B: command not found INFO:housekeeping:Removing all containers on this host... INFO:housekeeping:Successfully removed all containers on this host. INFO:housekeeping:Removing all user-defined networks on this host... INFO:housekeeping:Successfully removed all user-defined networks on this host. INFO:housekeeping:Clearing container entries from /etc/hosts... INFO:housekeeping:Successfully cleared container entries from /etc/hosts. INFO:housekeeping:Restarting Docker daemon... INFO:housekeeping:Successfully nuked this host. [root@vm-02 ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0d4936fe85f4 docker.io/cloudera/clusterdock:latest "python ./bin/hous..." 14 seconds ago Exited (0) 10 seconds ago festive_edison [root@vm-02 ~]# clusterdock_run ./bin/start_cluster -n testing cdh --include-service-types=HDFS,YARN,ZOOKEEPER --dont-start-cluster -bash: B: command not found INFO:clusterdock.cluster:Network (testing) not present, creating it... INFO:clusterdock.cluster:Successfully setup network (name: testing). INFO:clusterdock.cluster:Successfully started node-2.testing (IP address: 192.168.123.3). INFO:clusterdock.cluster:Successfully started node-1.testing (IP address: 192.168.123.2). INFO:clusterdock.cluster:Started cluster in 26.68 seconds. INFO:clusterdock.topologies.cdh.actions:Changing server_host to node-1.testing in /etc/cloudera-scm-agent/config.ini... INFO:clusterdock.topologies.cdh.actions:Restarting CM agents... cloudera-scm-agent is already stopped Starting cloudera-scm-agent: [ OK ] Stopping cloudera-scm-agent: [ OK ] Starting cloudera-scm-agent: [ OK ] INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online... INFO:clusterdock.topologies.cdh.actions:Detected Cloudera Manager server after 88.23 seconds. INFO:clusterdock.topologies.cdh.actions:CM server is now accessible at http://test.local:32779 INFO:clusterdock.topologies.cdh.cm:Detected CM API v13. INFO:clusterdock.topologies.cdh.cm_utils:Updating database configurations... INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...
Unless it returns an error saying it's timed out, it looks like it's still running... How long do you let it stay at that NameNode references step before giving up on it? Also, wanna look in Cloudera Manager to see if the command is running?
http://10.120.1.14:32781/cmf/process/67/logs?filename=stdout.log
HTTP ERROR 502 Problem accessing /cmf/process/67/logs. Reason: BAD_GATEWAY
Login to CM, aborted Hive commands ... then
INFO:clusterdock.topologies.cdh.cm:Updating NameNode references in Hive metastore...
WARNING:clusterdock.topologies.cdh.cm:Failed to update NameNode references in Hive metastore (command returned
But ..when i start HDFS ... also getting the same errors http://10.120.1.14:32781/cmf/process/68/logs?filename=stdout.log HTTP ERROR 502 Problem accessing /cmf/process/68/logs. Reason: BAD_GATEWAY Powered by Jetty://
Reimage to Ubuntu 14. No problems at all.
Thanks
[root@vm02 ~]# clusterdock_run ./bin/start_cluster -n testing cdh --primary-node=node-1 --secondary-nodes='node-{2..4}' --include-service-types=HDFS,HIVE,HUE,ZOOKEEPER,HBASE,YARN,SPARK_ON_YARN,SQOOP2 ++ clusterdock_run ./bin/start_cluster -n testing cdh --primary-node=node-1 '--secondary-nodes=node-{2..4}' --include-service-types=HDFS,HIVE,HUE,ZOOKEEPER,HBASE,YARN,SPARK_ON_YARN,SQOOP2 ++ '[' -z docker.io/cloudera/clusterdock:latest ']' ++ '[' '' '!=' false ']' ++ sudo docker pull docker.io/cloudera/clusterdock:latest ++ '[' -n '' ']' ++ '[' -n '' ']' ++ '[' -n '' ']' ++ '[' -n '' ']' ++ '[' -n '' ']' ++ B -bash: B: command not found ++ sudo docker run --net=host -t --privileged -v /tmp/clusterdock -v /etc/hosts:/etc/hosts -v /etc/localtime:/etc/localtime -v /var/run/docker.sock:/var/run/docker.sock docker.io/cloudera/clusterdock:latest ./bin/start_cluster -n testing cdh --primary-node=node-1 '--secondary-nodes=node-{2..4}' --include-service-types=HDFS,HIVE,HUE,ZOOKEEPER,HBASE,YARN,SPARK_ON_YARN,SQOOP2 INFO:clusterdock.topologies.cdh.actions:Pulling image docker.io/cloudera/clusterdock:cdh580_cm581_primary-node. This might take a little while... Trying to pull repository docker.io/cloudera/clusterdock ... cdh580_cm581_primary-node: Pulling from docker.io/cloudera/clusterdock Digest: sha256:9feffbfc5573262a6efbbb0a969efde890e63ced8a4ab3c9982f4f0dc607e429 INFO:clusterdock.topologies.cdh.actions:Pulling image docker.io/cloudera/clusterdock:cdh580_cm581_secondary-node. This might take a little while... Trying to pull repository docker.io/cloudera/clusterdock ... cdh580_cm581_secondary-node: Pulling from docker.io/cloudera/clusterdock Digest: sha256:251778378b362adff4e93b99d423848216e4823965dabd1bd4c41dbb4c79afcf INFO:clusterdock.cluster:Successfully started node-2.testing (IP address: 192.168.124.7). INFO:clusterdock.cluster:Successfully started node-3.testing (IP address: 192.168.124.8). INFO:clusterdock.cluster:Successfully started node-4.testing (IP address: 192.168.124.9). INFO:clusterdock.cluster:Successfully started node-1.testing (IP address: 192.168.124.6). INFO:clusterdock.cluster:Started cluster in 12.68 seconds. INFO:clusterdock.topologies.cdh.actions:Changing server_host to node-1.testing in /etc/cloudera-scm-agent/config.ini... INFO:clusterdock.topologies.cdh.actions:Removing files (/var/lib/cloudera-scm-agent/uuid, /dfs/dn/current/) from hosts (node-3.testing, node-4.testing)... INFO:clusterdock.topologies.cdh.actions:Restarting CM agents... cloudera-scm-agent is already stopped cloudera-scm-agent is already stopped Starting cloudera-scm-agent: bash: /var/log/cloudera-scm-agent/cloudera-scm-agent.out: No such file or directory [FAILED] Starting cloudera-scm-agent: bash: /var/log/cloudera-scm-agent/cloudera-scm-agent.out: No such file or directory [FAILED]
Fatal error: run() received nonzero return code 1 while executing!
Requested: service cloudera-scm-agent restart Executed: /bin/bash -l -c "service cloudera-scm-agent restart"
Aborting.
Fatal error: run() received nonzero return code 1 while executing!
Requested: service cloudera-scm-agent restart Executed: /bin/bash -l -c "service cloudera-scm-agent restart"
Aborting. cloudera-scm-agent is already stopped Starting cloudera-scm-agent: bash: /var/log/cloudera-scm-agent/cloudera-scm-agent.out: No such file or directory [FAILED]
Fatal error: run() received nonzero return code 1 while executing!
Requested: service cloudera-scm-agent restart Executed: /bin/bash -l -c "service cloudera-scm-agent restart"
Aborting. cloudera-scm-agent is already stopped Starting cloudera-scm-agent: bash: /var/log/cloudera-scm-agent/cloudera-scm-agent.out: No such file or directory [FAILED]
Fatal error: run() received nonzero return code 1 while executing!
Requested: service cloudera-scm-agent restart Executed: /bin/bash -l -c "service cloudera-scm-agent restart"
Aborting.
Fatal error: One or more hosts failed while executing task '_task'
Aborting. INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online... Traceback (most recent call last): File "./bin/start_cluster", line 70, in
main()
File "./bin/start_cluster", line 63, in main
actions.start(args)
File "/root/clusterdock/clusterdock/topologies/cdh/actions.py", line 108, in start
CM_SERVER_PORT, timeout_sec=180)
File "/root/clusterdock/clusterdock/utils.py", line 52, in wait_for_port_open
timeout_sec, address, port
Exception: Timed out after 180 seconds waiting for 192.168.124.6:7180 to be open.
++ '[' -n '' ']'
++ printf '\033]0;%s@%s:%s\007' root vm02 '~'