Open elfeto opened 9 years ago
Outside of the actually running of the job on the worker, I do not think that the worker is tied to a particular master machine.
When running a Job the worker does know which master it is serving. Otherwise it would not be able to save the results back to DDFS (The save_results
worker parameter implemented here https://github.com/discoproject/disco/blob/a89e37843bad58f744a29d642f84686ae467a6c0/master/src/job_coordinator.erl#L393)
If you want to specify a Disco master node on the nodes for the purpose of running disco/ddfs commands from the node then you can use the DISCO_MASTER_HOST
setting (http://disco.readthedocs.io/en/latest/lib/settings.html), for example by setting the DISCO_MASTER_HOST in the appropriate shell file when you login to the nodes.
Hi,
I have disco running with 7 nodes, master included. With the nodes with no disco process running I start the master, and the master starts the nodes like in the tutorial. The process on the node is like:
"disco 14960 0.0 0.0 290616 18636 ? Sl 10:24 0:00 /usr/lib64/erlang/erts-5.8.5/bin/beam.smp -K true -- -root /usr/lib64/erlang -progname erl -- -home /home/users/disco -- -noshell -noinput -noshell -noinput -master disco_8989_master@dtn-cn -sname disco_8989_slave@hulk -s slave slave_start disco_8989_master@dtn-cn slave_waiter_6 -connect_all false -pa /usr/lib/disco/master/ebin/ -pa /usr/lib/disco/master/deps/mochiweb/ebin -pa /usr/lib/disco/master/deps/lager/ebin -pa /usr/lib/disco/master/deps/plists/ebin -f:
There is no log on the nodes, but in the master "disco -v"
[disco@dtn-cn ~]$ disco -v | grep dtn-cn DISCO_JOB_OWNER = ftorres@dtn-cn.mydomain.com DISCO_MASTER = http://dtn-cn.mydomain.com8989 DISCO_MASTER_HOST = dtn-cn.mydomain.com DISCO_TEST_HOST = dtn-cn.mydomain.com Disco master at http://dtn-cn.mydomain.com:8989
In the salve "disco-v" [disco@hulk disco]$ disco -v | grep hulk DISCO_JOB_OWNER = disco@hulk.mydomain.com DISCO_MASTER = http://hulk.mydomain.com:8989 DISCO_MASTER_HOST = hulk.mydomain.com DISCO_TEST_HOST = hulk.mydomain.com Disco master at http://hulk.mydomain.com:8989
When I try to run a job in a node the final output is: disco.error.CommError: Unable to access resource (http://hulk.mydomain.com:8989/disco/job/new): couldn't connect to host (is disco master running at http://hulk.mydomain.com:8989?)
Is there a way to change the master in the nodes? what can I do?