discoproject / disco

a Map/Reduce framework for distributed computing
http://discoproject.org
BSD 3-Clause "New" or "Revised" License
1.63k stars 241 forks source link

Each disco node sees itself as the master #631

Open elfeto opened 9 years ago

elfeto commented 9 years ago

Hi,

I have disco running with 7 nodes, master included. With the nodes with no disco process running I start the master, and the master starts the nodes like in the tutorial. The process on the node is like:

"disco 14960 0.0 0.0 290616 18636 ? Sl 10:24 0:00 /usr/lib64/erlang/erts-5.8.5/bin/beam.smp -K true -- -root /usr/lib64/erlang -progname erl -- -home /home/users/disco -- -noshell -noinput -noshell -noinput -master disco_8989_master@dtn-cn -sname disco_8989_slave@hulk -s slave slave_start disco_8989_master@dtn-cn slave_waiter_6 -connect_all false -pa /usr/lib/disco/master/ebin/ -pa /usr/lib/disco/master/deps/mochiweb/ebin -pa /usr/lib/disco/master/deps/lager/ebin -pa /usr/lib/disco/master/deps/plists/ebin -f:

There is no log on the nodes, but in the master "disco -v"

[disco@dtn-cn ~]$ disco -v | grep dtn-cn DISCO_JOB_OWNER = ftorres@dtn-cn.mydomain.com DISCO_MASTER = http://dtn-cn.mydomain.com8989 DISCO_MASTER_HOST = dtn-cn.mydomain.com DISCO_TEST_HOST = dtn-cn.mydomain.com Disco master at http://dtn-cn.mydomain.com:8989

In the salve "disco-v" [disco@hulk disco]$ disco -v | grep hulk DISCO_JOB_OWNER = disco@hulk.mydomain.com DISCO_MASTER = http://hulk.mydomain.com:8989 DISCO_MASTER_HOST = hulk.mydomain.com DISCO_TEST_HOST = hulk.mydomain.com Disco master at http://hulk.mydomain.com:8989

When I try to run a job in a node the final output is: disco.error.CommError: Unable to access resource (http://hulk.mydomain.com:8989/disco/job/new): couldn't connect to host (is disco master running at http://hulk.mydomain.com:8989?)

Is there a way to change the master in the nodes? what can I do?

gilessbrown commented 8 years ago

Outside of the actually running of the job on the worker, I do not think that the worker is tied to a particular master machine.

When running a Job the worker does know which master it is serving. Otherwise it would not be able to save the results back to DDFS (The save_results worker parameter implemented here https://github.com/discoproject/disco/blob/a89e37843bad58f744a29d642f84686ae467a6c0/master/src/job_coordinator.erl#L393)

If you want to specify a Disco master node on the nodes for the purpose of running disco/ddfs commands from the node then you can use the DISCO_MASTER_HOST setting (http://disco.readthedocs.io/en/latest/lib/settings.html), for example by setting the DISCO_MASTER_HOST in the appropriate shell file when you login to the nodes.