Qihoo360 / dgl-operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes
Apache License 2.0
44 stars 6 forks source link

deliver partitions error #22

Open allendred opened 2 years ago

allendred commented 2 years ago

Phase 2/5: deliver partitions

['12.100.10.0', '30050', 'dgl-graphsage-launcher'] Traceback (most recent call last): File "tools/launch.py", line 280, in main() File "tools/launch.py", line 252, in main run_cp_container(args) File "tools/launch.py", line 103, in run_cp_container kubexec_container(f'mkdir -p {args.target_dir}', pod_name, args.container) File "tools/launch.py", line 31, in kubexec_container subprocess.check_call(cmd, shell = True) File "/usr/local/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'sh /etc/dgl/kubexec.sh 'dgl-graphsage-launcher -c watcher-loop-partitioner' 'mkdir -p /dgl_workspace'' returned non-zero exit status 1.

Phase 2/5 error raised

Some time another error may occur.

Phase 2/5: deliver partitions

Launch arguments: Namespace(cmd_type='copy_batch_container', container='watcher-loop-partitioner', ip_config='/etc/dgl/leadfile', num_parts=None, num_samplers=0, num_server_threads=1, num_servers=None, num_trainers=None, part_config=None, source_file_paths='/dgl_workspace/dataset', target_dir='/dgl_workspace', worker_chief_index=0, workspace='/dgl_workspace'), [] 30050 dgl-graphsage-launcher

['30050', 'dgl-graphsage-launcher'] Traceback (most recent call last): File "tools/launch.py", line 280, in main() File "tools/launch.py", line 252, in main run_cp_container(args) File "tools/launch.py", line 100, in run_cp_container for pod_info in get_ip_host_pairs(args.ip_config): File "tools/launch.py", line 64, in get_ip_host_pairs raise RuntimeError("Format error of ip_config.") RuntimeError: Format error of ip_config. /etc/dgl/leadfile may loss ip.

At another cluster Phase 2/5: deliver partitions

allendred commented 2 years ago

@ryantd