alibaba / euler

A distributed graph deep learning framework.
Apache License 2.0
2.89k stars 559 forks source link

zookeeper分布式训练时报错 #217

Open xuetf opened 4 years ago

xuetf commented 4 years ago

请教下zookeeper只需要在master上安装就行吗? 我只在hadoop-master节点上装了zookeeper,测试连接正常,用zookeeper自带客户端(zkCli.sh -server localhost:2181)能连接成功并运行create命令。但是运行euler训练的时候报错了,不知道是什么原因,euler_zk_path尝试过zookeeper里头配置的/tmp/zookeeper以及任意的/path/euler的不行,一直报:ZK error when creating meta: no node。麻烦有空指导一下,谢谢。

运行指令如下: ./dist_tf_euler.sh \ --data_dir hdfs://hadoop-master:9000/user/root/ppi \ --euler_zk_addr hadoop-master:2181 --euler_zk_path /path/euler \ --max_id 56944 --feature_idx 1 --feature_dim 50 --label_idx 0 --label_dim 121 \ --model graphsage_supervised --mode train

错误信息work0(work1也一样): I0104 13:37:16.686715 9816 graph_builder.cc:81] Load Done: hdfs://hadoop-master:9000/user/root/ppi/ppi_data_0.dat I0104 13:37:16.686715 9818 graph_builder.cc:81] Load Done: hdfs://hadoop-master:9000/user/root/ppi/ppi_data_4.dat I0104 13:37:16.700754 9817 graph_builder.cc:81] Load Done: hdfs://hadoop-master:9000/user/root/ppi/ppi_data_2.dat I0104 13:37:16.701020 9656 graph_builder.cc:127] Each Thread Load Finish! Node Count:34166 Edge Count:963932 I0104 13:37:16.701040 9656 graph_builder.cc:135] Graph Loading Finish! I0104 13:37:16.982077 9656 graph_builder.cc:147] Graph Load Finish! Node Count:34166 Edge Count:963932 I0104 13:37:16.988603 9656 graph_builder.cc:152] Done: build node sampler I0104 13:37:16.988626 9656 graph_builder.cc:162] Graph build finish I0104 13:37:16.988660 9656 graph_service.cc:179] service init finish I0104 13:37:16.989260 9656 graph_service.cc:131] bound port: 172.20.0.2:46415 E0104 13:37:16.993388 9656 zk_server_register.cc:77] ZK error when creating root node: no node. W0104 13:37:16.993427 9656 graph.h:198] global sampler is not ok E0104 13:37:16.994678 9656 zk_server_register.cc:99] ZK error when creating meta: no node. I0104 13:37:16.994695 9656 graph_service.cc:146] service start。

hdfs正常,分片也正常。

不好意思,我发现是我没有手动建立这个node。现在运行正常。