Another Time-out after 180 Seconds

rocknjoekudo commented 8 years ago

Hello, the great contributers of this amazing project,

I am a student who is learning CDH from scratch, and I am new to this docker project. I followed the instructions from this tutorial https://hub.docker.com/r/cloudera/clusterdock/, but I encountered the following timeout error. This is quite similar to this closed issue https://github.com/cloudera/clusterdock/issues/2, but running with a different environment.

rj@rj-ubuntu:~$ clusterdock_run ./bin/start_cluster cdhINFO:clusterdock.cluster:Successfully started node-2.cluster (IP address: 192.168.123.3).
INFO:clusterdock.cluster:Successfully started node-1.cluster (IP address: 192.168.123.2).
INFO:clusterdock.cluster:Started cluster in 6.61 seconds.
INFO:clusterdock.topologies.cdh.actions:Changing server_host to node-1.cluster in /etc/cloudera-scm-agent/config.ini...
INFO:clusterdock.topologies.cdh.actions:Restarting CM agents...
cloudera-scm-agent is already stopped
Starting cloudera-scm-agent: [  OK  ]
Stopping cloudera-scm-agent: [  OK  ]
Starting cloudera-scm-agent: [  OK  ]
INFO:clusterdock.topologies.cdh.actions:Waiting for Cloudera Manager server to come online...
Traceback (most recent call last):
  File "./bin/start_cluster", line 70, in <module>
    main()
  File "./bin/start_cluster", line 63, in main
    actions.start(args)
  File "/root/clusterdock/clusterdock/topologies/cdh/actions.py", line 108, in start
    CM_SERVER_PORT, timeout_sec=180)
  File "/root/clusterdock/clusterdock/utils.py", line 52, in wait_for_port_open
    timeout_sec, address, port
Exception: Timed out after 180 seconds waiting for 192.168.123.2:7180 to be open.

The docker is running on top of a native Ubuntu 16.04.1 LTS 64bits, with up-to-date packages including docker-engine. My CPU is i7-4710HQ, and my RAM is 12GB.

rj@rj-ubuntu:~$ uname -r
4.4.0-47-generic
rj@rj-ubuntu:~$ apt-cache policy docker-engine
docker-engine:
  Installed: 1.12.3-0~xenial
  Candidate: 1.12.3-0~xenial
  Version table:
 *** 1.12.3-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages
        100 /var/lib/dpkg/status
     1.12.2-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages
     1.12.1-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages
     1.12.0-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages
     1.11.2-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages
     1.11.1-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages
     1.11.0-0~xenial 500
        500 https://apt.dockerproject.org/repo ubuntu-xenial/main amd64 Packages

I tried to restart the docker-engine service, but the result remained the same. Please leave me some hints for this issue. Thanks in advance. RJ, a master student from Vrije Universiteit Brussel

dimaspivak commented 8 years ago

Hi RJ,

How much memory do you have free when you run the clusterdock_run command? 12 GB of RAM is cutting it really close to not having enough resources to support even a 2-node CDH cluster.

rocknjoekudo commented 8 years ago

Here is my free space of my RAM before running docker version of CDH.

rj@rj-ubuntu:~$ free -g
              total        used        free      shared  buff/cache   available
Mem:             11           0          10           0           0          10
Swap:             7           0           7

I am not sure if this enough for me to run it.

dimaspivak commented 8 years ago

Can you run that command while clusterdock is waiting for CM to come online?

rocknjoekudo commented 8 years ago

This is the RAM usage with the running clusterdock while CPU usage is 100% for 4 of 8 cores.

rj@rj-ubuntu:~$ free -g
              total        used        free      shared  buff/cache   available
Mem:             11           2           7           0           1           8
Swap:             7           0           7

And this is the RAM usage with the still running clusterdock while CPU usage dropped to nearly 0.

rj@rj-ubuntu:~$ free -g
              total        used        free      shared  buff/cache   available
Mem:             11           1           8           0           1           9
Swap:             7           0           7

I have noticed that there is always a time slot that both CPU and RAM usages drops to a low level, while the clusterdock is still running. I guess some processes stopped here, or simply waiting for further commands.

iasinDev commented 8 years ago

Hi, I had the same problem some days ago, I searched across diferent same issues .... Finally I concluded it's due to ram resources error. I saw that after this error I was able to start cloudera manager through command line but after some seconds it crushed, I tried with at least 20 GB for the cluster, and it worked. To sum sup, the startup script wait to cloudera is up to continue but due to inssuficient resources, cloudera manager is not able to start, so it fails.

rocknjoekudo commented 8 years ago

@iasinDev Thanks for your reply, my friend. I can understand your proposal. But then I have another question: why these Virtual Machines images can run clusterdock with only 4GB RAM allocated?

iasinDev commented 8 years ago

The problem is the services, by default when cluster starts, start cloudera manager and after all CDH services activated, so a lot of ram is needed; that's probably the reason why @dimaspivak tell in another issues to use the param of not start services

iasinDev commented 8 years ago

@rocknjoekudo try with the options described at the blog post of only start some services

rocknjoekudo commented 8 years ago

@iasinDev Thanks again. But I still have a doubt: why clusterdock on my machine didn't occupy much RAM when it was starting, at the same time that CPU resources are occupied quite much? And could you please send me the blog post link? I couldn't find it.

chathuriw commented 6 years ago

I'm also getting the same error even though I have 32GB as RAM and it is not using that much RAM while running the startcluster script. Appreciate any help on this.

dimaspivak commented 6 years ago

Can you share details of your OS, Docker version, host, etc.? This is a year-old thread so many things might have changed since then.

miguelraffoul commented 5 years ago

Not sure whether or not this is still an active repo, but has anybody found a solution to this issue? It's algo happening to me. I'm using Oracle Linux Server 7.6, and the machine has 64GB of RAM (out of which 50 are free) and 12 available cores. I can see that there isn't really much of RAM or CPU usage while clusterdock waits for CM to come online. Is it possible to start the cluster with all services except CM?

cloudera / clusterdock

Another Time-out after 180 Seconds #14