Closed amznero closed 2 years ago
Graphlearn server and Tensorflow ps can't use the same host. So assign another host for graphlearn server. Command like:
python dist_train.py --ps_hosts=ip1:2222 --worker_hosts=ip2:2222 --gl_hosts=ip1:2223 --job_name=ps --task_index=0
And use the gl_hosts for constructing graphlearn cluster:
graph_cluster = {"client": FLAGS.worker_hosts, "server": FLAGS.gl_hosts}
Thank you for the response.
It's working for me when I chose two different hosts for TF-PS and GL-server process.
But if I use a K8S's TFJOB to startup the program, each pod will only get one IP: Host.
Is there have any solutions to solve this problem?
In kubeflow, you can add replicas of Evaluator for GL-server.
Hi there, Something went wrong when I used RPC mode to sync system states, which mentioned in pr-65.
Dataset: Cora Code: graph-learn/examples/tf/graphsage Config: 1 ps + 1 worker
CODE SNIPPET
ERROR LOG
PS
WORKER