amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

problem launching bdas cluster #158

Open dqingram opened 10 years ago

dqingram commented 10 years ago

I followed the instructions here for launching a BDAS cluster: http://ampcamp.berkeley.edu/3/exercises/launching-a-bdas-cluster-on-ec2.html

Everything seemingly goes ok. When I try to list the files on hdfs I get the following errors and I cannot continue the exercises:

[root@ip-xxx-xxx-xxx-xxx ~]# ephemeral-hdfs/bin/hadoop fs -ls /wiki/pagecounts 14/09/01 03:17:12 INFO ipc.Client: Retrying connect to server: ec2-107-22-16-100.compute-1.amazonaws.com/10.185.78.82:9000. Already tried {#} time(s).

Did I mess something up? Is there a way to resolve this?

JoshRosen commented 10 years ago

It looks like the AMP Camp 3 instructions didn't specify a branch when cloning the training scripts, so my hunch is that you're trying to use a newer version of the scripts alongside an older set of instructions.

Maybe try

git clone git://github.com/amplab/training-scripts.git -b ampcamp3

Perhaps the HDFS cluster hasn't started. What happens if you do ~/ephemeral-hdfs/bin/start-all.sh?

dqingram commented 10 years ago

I cloned the ampcamp3 branch. After installation (there was an error, see below), I logged into my master instance and HDFS appeared to be up, but there was no data in it:

 # ephemeral-hdfs/bin/hadoop fs -ls /wiki/pagecounts  
    Cannot access /wiki/pagecounts: No such file or directory.

I tried to relaunch the install using " --copy --resume launch amplab-training". I also tried just copying the data using "copy-data amplab-training".
Incidentally, I'm also setting the wait time using: "-w 480"

I'm consistently getting the error: Shutting down GANGLIA gmond: [ OK ] Starting GANGLIA gmond: [ OK ] Shutting down GANGLIA gmond: [ OK ] Starting GANGLIA gmond: [ OK ] Connection to ec2-54-87-43-131.compute-1.amazonaws.com closed. Shutting down GANGLIA gmond: [ OK ] Starting GANGLIA gmond: [ OK ] Connection to ec2-54-81-248-43.compute-1.amazonaws.com closed. Shutting down GANGLIA gmond: [ OK ] Starting GANGLIA gmond: [ OK ] Connection to ec2-54-211-239-115.compute-1.amazonaws.com closed. Shutting down GANGLIA gmond: [ OK ] Starting GANGLIA gmond: [ OK ] Connection to ec2-54-226-182-52.compute-1.amazonaws.com closed. Shutting down GANGLIA gmond: [ OK ] Starting GANGLIA gmond: [ OK ] Connection to ec2-54-166-4-67.compute-1.amazonaws.com closed. ln: creating symbolic link `/var/lib/ganglia/conf/default.json': File exists Shutting down GANGLIA gmetad: [ OK ] Starting GANGLIA gmetad: [ OK ] Stopping httpd: [ OK ] Starting httpd: [ OK ] Waiting for cluster to start... Traceback (most recent call last): File "./spark_ec2.py", line 915, in main() File "./spark_ec2.py", line 758, in main err = wait_for_spark_cluster(master_nodes, opts) File "./spark_ec2.py", line 724, in wait_for_spark_cluster err = check_spark_cluster(master_nodes, opts) File "./spark_ec2.py", line 453, in check_spark_cluster response = urllib2.urlopen(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno 61] Connection refused>

dqingram commented 10 years ago

should I close the initial issue, and resubmit a new one?

I'd greatly appreciate some assistance getting BDAS up and running.