alibaba / GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
https://graphscope.io
Apache License 2.0
3.28k stars 442 forks source link

[BUG] when graphscope session is dynamically created in local mode., graphscope client can't interact with gremlin server (GIE) #612

Closed 346057177 closed 3 years ago

346057177 commented 3 years ago

Describe the bug

when graphscope session is dynamically created in local mode., graphscope client can't interact with gremlin server (GIE) ,

To Reproduce sess = graphscope.session() graph = sess.g() g = graph.add_vertices(df_person, label='person').add_edges(df_knows, label ='knows', src_label='person', dst_label ='person') interactive = sess.gremlin(g) print(interactive)

Expected behavior Traceback (most recent call last): File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/rpc.py", line 47, in with_grpc_catch return fn(*args, **kwargs) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/rpc.py", line 169, in create_interactive_engine response = self._stub.CreateInteractiveInstance(request) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/grpc/_channel.py", line 946, in call return _end_unary_response_blocking(state, call, False, None) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNKNOWN details = "Exception calling application: <urlopen error [Errno 111] Connection refused>"

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

yecol commented 3 years ago

Thanks for your feedback, we will try to replay the bug and get back to you!

acezen commented 3 years ago

hi, can you complete the environment information, what's the GraphScope version? what's the OS you run GraphScope?

346057177 commented 3 years ago

hi, can you complete the environment information, what's the GraphScope version? what's the OS you run GraphScope? environment GraphScope version: 0.5.0 OS: Ubuntu 20.04.2 Kubernetes Version v1.20.5 python3: version3.6.8 build mode: local build

running on k8s cluster, graph session that has been successfully created, pods with GIE, GAE has been launched as well, but it failed at interactive = sess.gremlin(g) statement. below is the detailed log

2021-07-28 03:32:17,540 [INFO][session:583]: Initializing graphscope session with parameters: {'addr': None, 'mode': 'eager', 'cluster_type': 'k8s', 'num_workers': 2, 'preemptive': True, 'k8s_namespace': None, 'k8s_service_type': 'NodePort', 'k8s_gs_image': 'registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0', 'k8s_etcd_image': 'quay.io/coreos/etcd:v3.4.13', 'k8s_image_pull_policy': 'IfNotPresent', 'k8s_image_pull_secrets': [], 'k8s_gie_graph_manager_image': 'registry.cn-hongkong.aliyuncs.com/graphscope/maxgraph_standalone_manager:0.5.0', 'k8s_zookeeper_image': 'zookeeper:3.4.14', 'k8s_coordinator_cpu': 0.5, 'k8s_coordinator_mem': '512Mi', 'k8s_etcd_num_pods': 1, 'k8s_etcd_cpu': 2, 'k8s_etcd_mem': '4Gi', 'k8s_zookeeper_cpu': 2, 'k8s_zookeeper_mem': '4Gi', 'k8s_gie_graph_manager_cpu': 2, 'k8s_gie_graph_manager_mem': '4Gi', 'k8s_vineyard_daemonset': 'none', 'k8s_vineyard_cpu': 4, 'k8s_vineyard_mem': '4Gi', 'vineyard_shared_mem': '4Gi', 'k8s_engine_cpu': 4, 'k8s_engine_mem': '4Gi', 'k8s_mars_worker_cpu': 0.2, 'k8s_mars_worker_mem': '512Mi', 'k8s_mars_scheduler_cpu': 0.2, 'k8s_mars_scheduler_mem': '512Mi', 'with_mars': False, 'k8s_volumes': {}, 'k8s_waiting_for_delete': False, 'timeout_seconds': 600, 'dangling_timeout_seconds': 600, 'k8s_client_config': {}} 2021-07-28 03:32:18,455 [INFO][cluster:308]: Launching coordinator... 2021-07-28 03:32:21,891 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Successfully assigned gs-kukjzb/coordinator-pwkpkx-5cd75cc784-nmk2d to rancher-node3 2021-07-28 03:32:21,891 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0" already present on machine 2021-07-28 03:32:23,071 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Created container coordinator 2021-07-28 03:32:23,072 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Started container coordinator 2021-07-28 11:32:29,404 [INFO][cluster:684]: Launching GIE graph manager ... 2021-07-28 11:32:30,569 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Successfully assigned gs-kukjzb/gs-graphmanager-pwkpkx-5c8b456646-p96g2 to rancher-node2 2021-07-28 11:32:33,564 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/maxgraph_standalone_manager:0.5.0" already present on machine 2021-07-28 11:32:33,566 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Created container manager 2021-07-28 11:32:33,568 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Started container manager 2021-07-28 11:32:33,571 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Container image "zookeeper:3.4.14" already present on machine 2021-07-28 11:32:33,573 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Created container zookeeper 2021-07-28 11:32:33,575 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Started container zookeeper 2021-07-28 03:32:36,152 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Readiness probe failed: dial tcp 10.42.27.70:59184: connect: connection refused 2021-07-28 11:32:36,582 [INFO][cluster:797]: GIE graph manager service is ready. 2021-07-28 11:32:36,583 [INFO][cluster:541]: Launching etcd ... 2021-07-28 11:32:37,683 [INFO][cluster:807]: Etcd is ready, endpoint is xx.xx.xx.xx:58375 2021-07-28 11:32:37,683 [INFO][cluster:431]: Launching GraphScope engines pod ... 2021-07-28 11:32:38,294 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Successfully assigned gs-kukjzb/gs-engine-pwkpkx-7r6wl to rancher-node3 2021-07-28 11:32:39,301 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Successfully assigned gs-kukjzb/gs-engine-pwkpkx-hf5fq to rancher-node3 2021-07-28 11:32:42,984 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0" already present on machine 2021-07-28 11:32:42,986 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Created container engine 2021-07-28 11:32:42,988 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Started container engine 2021-07-28 11:32:42,993 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Created container vineyard 2021-07-28 11:32:42,996 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Started container vineyard 2021-07-28 11:32:43,992 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0" already present on machine 2021-07-28 11:32:43,994 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Created container engine 2021-07-28 11:32:43,997 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Started container engine 2021-07-28 11:32:44,001 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Created container vineyard 2021-07-28 11:32:44,004 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Started container vineyard 2021-07-28 11:32:56,674 [DEBUG][cluster:896]: vineyard rpc runs on xx.xx.xx.xx:30441 2021-07-28 11:32:56,675 [INFO][cluster:900]: GraphScope engines pod is ready. 2021-07-28 11:32:56,684 [INFO][cluster:1043]: Engines pod name list: ['gs-engine-pwkpkx-7r6wl', 'gs-engine-pwkpkx-hf5fq'] 2021-07-28 11:32:56,684 [INFO][cluster:1044]: Engines pod ip list: ['xx.xx.xx.xx', 'xx.xx.xx.xx'] 2021-07-28 11:32:56,684 [INFO][cluster:1045]: Engines pod host ip list: ['xx.xx.xx.xx', 'xx.xx.xx.xx'] 2021-07-28 11:32:56,684 [INFO][cluster:1047]: Vineyard service endpoint: xx.xx.xx.xx:30441 2021-07-28 11:32:56,684 [INFO][cluster:936]: Starting GAE rpc service on xx.xx.xx.xx:56507 ... 2021-07-28 11:32:59,202 [DEBUG][utils:991]: Resolve mpi cmd prefix: ['mpirun', '--allow-run-as-root', '-n', '2', '-host', 'gs-engine-pwkpkx-7r6wl:1,gs-engine-pwkpkx-hf5fq:1'] 2021-07-28 11:32:59,202 [DEBUG][utils:992]: Resolve mpi env: {'OMPI_MCA_btl_vader_single_copy_mechanism': 'none', 'OMPI_MCA_orte_allowed_exit_without_sync': '1', 'OMPI_MCA_odls_base_sigkill_timeout': '0', 'OMPI_MCA_plm_rsh_agent': '/usr/local/bin/kube_ssh'} 2021-07-28 11:32:59,375 [DEBUG][cluster:973]: Analytical engine launching command: mpirun --allow-run-as-root -n 2 -host gs-engine-pwkpkx-7r6wl:1,gs-engine-pwkpkx-hf5fq:1 grape_engine --host 0.0.0.0 --port 56507 -v 10 --vineyard_socket /tmp/vineyard_workspace/vineyard.sock 2021-07-28 11:32:59,566 [INFO][coordinator:1131]: Coordinator server listen at 0.0.0.0:59184 2021-07-28 03:33:07,926 [INFO][cluster:567]: Coordinator pod start successful with address xx.xx.xx.xx:32138, connecting to service ... 2021-07-28 03:33:13,068 [INFO][rpc:111]: GraphScope coordinator service connected. I0728 11:33:16.370306 96 grape_instance.cc:864] Registering Graph, graph type: 4, Type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521, lib path: /tmp/gs/builtin/e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521/libe33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521.so I0728 11:33:16.379833 88 grape_instance.cc:864] Registering Graph, graph type: 4, Type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521, lib path: /tmp/gs/builtin/e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521/libe33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521.so I0728 11:33:16.416115 96 grape_instance.cc:110] Loading graph, graph name: graph_a37JncCH, graph type: ArrowFragment, type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521 I0728 11:33:16.427814 88 grape_instance.cc:110] Loading graph, graph name: graph_a37JncCH, graph type: ArrowFragment, type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521 I0728 11:33:17.047847 88 property_graph_frame.cc:107] [worker-1] loaded graph to vineyard ... I0728 11:33:17.055609 96 property_graph_frame.cc:107] [worker-0] loaded graph to vineyard ...

Connecting to the PostgreSQL database... I0728 11:33:18.075819 88 property_graph_frame.cc:274] [worker-1] Add labels to graph and loaded to vineyard ... I0728 11:33:18.076035 96 property_graph_frame.cc:274] [worker-0] Add labels to graph and loaded to vineyard ... I0728 11:33:19.383785 88 property_graph_frame.cc:274] [worker-1] Add labels to graph and loaded to vineyard ... I0728 11:33:19.395239 96 property_graph_frame.cc:274] [worker-0] Add labels to graph and loaded to vineyard ... I0728 11:33:19.528188 88 gs_object.h:65] Object graph_yDsbzayy[LABELED_FRAGMENT_WRAPPER] is destructed. I0728 11:33:19.556005 96 gs_object.h:65] Object graph_yDsbzayy[LABELED_FRAGMENT_WRAPPER] is destructed. Traceback (most recent call last): File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/session.py", line 1135, in gremlin 2021-07-28 11:43:28,676 [ERROR][coordinator:606]: create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout engine_params=engine_params, File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/rpc.py", line 47, in with_grpc_catch 2021-07-28 11:43:28,676 [ERROR][coordinator:606]: create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout return fn(*args, **kwargs) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/rpc.py", line 170, in create_interactive_engine return check_grpc_response(response) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/framework/errors.py", line 181, in check_grpc_response raise error_type(status.error_msg, detail) graphscope.framework.errors.InteractiveEngineInternalError: 'create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout' The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "app.py", line 34, in interactive = sess.gremlin(g) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/utils.py", line 156, in wrapper return_value = func(*args, **kwargs) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/session.py", line 1140, in gremlin raise InteractiveEngineInternalError(str(e)) from e graphscope.framework.errors.InteractiveEngineInternalError: "'create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout'" 2021-07-28 11:43:29,418 [INFO][coordinator:635]: Coordinator close interactive instance with url[http://10.43.38.250:8080/instance/close?graphName=11144263247559523&podNameList=gs-engine-pwkpkx-7r6wl,gs-engine-pwkpkx-hf5fq&containerName=engine&waitingForDelete=False]

acezen commented 3 years ago

hi, can you complete the environment information, what's the GraphScope version? what's the OS you run GraphScope? environment GraphScope version: 0.5.0 OS: Ubuntu 20.04.2 Kubernetes Version v1.20.5 python3: version3.6.8 build mode: local build

running on k8s cluster, graph session that has been successfully created, pods with GIE, GAE has been launched as well, but it failed at interactive = sess.gremlin(g) statement. below is the detailed log

2021-07-28 03:32:17,540 [INFO][session:583]: Initializing graphscope session with parameters: {'addr': None, 'mode': 'eager', 'cluster_type': 'k8s', 'num_workers': 2, 'preemptive': True, 'k8s_namespace': None, 'k8s_service_type': 'NodePort', 'k8s_gs_image': 'registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0', 'k8s_etcd_image': 'quay.io/coreos/etcd:v3.4.13', 'k8s_image_pull_policy': 'IfNotPresent', 'k8s_image_pull_secrets': [], 'k8s_gie_graph_manager_image': 'registry.cn-hongkong.aliyuncs.com/graphscope/maxgraph_standalone_manager:0.5.0', 'k8s_zookeeper_image': 'zookeeper:3.4.14', 'k8s_coordinator_cpu': 0.5, 'k8s_coordinator_mem': '512Mi', 'k8s_etcd_num_pods': 1, 'k8s_etcd_cpu': 2, 'k8s_etcd_mem': '4Gi', 'k8s_zookeeper_cpu': 2, 'k8s_zookeeper_mem': '4Gi', 'k8s_gie_graph_manager_cpu': 2, 'k8s_gie_graph_manager_mem': '4Gi', 'k8s_vineyard_daemonset': 'none', 'k8s_vineyard_cpu': 4, 'k8s_vineyard_mem': '4Gi', 'vineyard_shared_mem': '4Gi', 'k8s_engine_cpu': 4, 'k8s_engine_mem': '4Gi', 'k8s_mars_worker_cpu': 0.2, 'k8s_mars_worker_mem': '512Mi', 'k8s_mars_scheduler_cpu': 0.2, 'k8s_mars_scheduler_mem': '512Mi', 'with_mars': False, 'k8s_volumes': {}, 'k8s_waiting_for_delete': False, 'timeout_seconds': 600, 'dangling_timeout_seconds': 600, 'k8s_client_config': {}} 2021-07-28 03:32:18,455 [INFO][cluster:308]: Launching coordinator... 2021-07-28 03:32:21,891 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Successfully assigned gs-kukjzb/coordinator-pwkpkx-5cd75cc784-nmk2d to rancher-node3 2021-07-28 03:32:21,891 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0" already present on machine 2021-07-28 03:32:23,071 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Created container coordinator 2021-07-28 03:32:23,072 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Started container coordinator 2021-07-28 11:32:29,404 [INFO][cluster:684]: Launching GIE graph manager ... 2021-07-28 11:32:30,569 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Successfully assigned gs-kukjzb/gs-graphmanager-pwkpkx-5c8b456646-p96g2 to rancher-node2 2021-07-28 11:32:33,564 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/maxgraph_standalone_manager:0.5.0" already present on machine 2021-07-28 11:32:33,566 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Created container manager 2021-07-28 11:32:33,568 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Started container manager 2021-07-28 11:32:33,571 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Container image "zookeeper:3.4.14" already present on machine 2021-07-28 11:32:33,573 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Created container zookeeper 2021-07-28 11:32:33,575 [INFO][cluster:785]: [gs-graphmanager-pwkpkx-5c8b456646-p96g2]: Started container zookeeper 2021-07-28 03:32:36,152 [INFO][utils:167]: coordinator-pwkpkx-5cd75cc784-nmk2d: Readiness probe failed: dial tcp 10.42.27.70:59184: connect: connection refused 2021-07-28 11:32:36,582 [INFO][cluster:797]: GIE graph manager service is ready. 2021-07-28 11:32:36,583 [INFO][cluster:541]: Launching etcd ... 2021-07-28 11:32:37,683 [INFO][cluster:807]: Etcd is ready, endpoint is xx.xx.xx.xx:58375 2021-07-28 11:32:37,683 [INFO][cluster:431]: Launching GraphScope engines pod ... 2021-07-28 11:32:38,294 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Successfully assigned gs-kukjzb/gs-engine-pwkpkx-7r6wl to rancher-node3 2021-07-28 11:32:39,301 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Successfully assigned gs-kukjzb/gs-engine-pwkpkx-hf5fq to rancher-node3 2021-07-28 11:32:42,984 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0" already present on machine 2021-07-28 11:32:42,986 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Created container engine 2021-07-28 11:32:42,988 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Started container engine 2021-07-28 11:32:42,993 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Created container vineyard 2021-07-28 11:32:42,996 [INFO][cluster:864]: [gs-engine-pwkpkx-7r6wl]: Started container vineyard 2021-07-28 11:32:43,992 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Container image "registry.cn-hongkong.aliyuncs.com/graphscope/graphscope:0.5.0" already present on machine 2021-07-28 11:32:43,994 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Created container engine 2021-07-28 11:32:43,997 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Started container engine 2021-07-28 11:32:44,001 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Created container vineyard 2021-07-28 11:32:44,004 [INFO][cluster:864]: [gs-engine-pwkpkx-hf5fq]: Started container vineyard 2021-07-28 11:32:56,674 [DEBUG][cluster:896]: vineyard rpc runs on xx.xx.xx.xx:30441 2021-07-28 11:32:56,675 [INFO][cluster:900]: GraphScope engines pod is ready. 2021-07-28 11:32:56,684 [INFO][cluster:1043]: Engines pod name list: ['gs-engine-pwkpkx-7r6wl', 'gs-engine-pwkpkx-hf5fq'] 2021-07-28 11:32:56,684 [INFO][cluster:1044]: Engines pod ip list: ['xx.xx.xx.xx', 'xx.xx.xx.xx'] 2021-07-28 11:32:56,684 [INFO][cluster:1045]: Engines pod host ip list: ['xx.xx.xx.xx', 'xx.xx.xx.xx'] 2021-07-28 11:32:56,684 [INFO][cluster:1047]: Vineyard service endpoint: xx.xx.xx.xx:30441 2021-07-28 11:32:56,684 [INFO][cluster:936]: Starting GAE rpc service on xx.xx.xx.xx:56507 ... 2021-07-28 11:32:59,202 [DEBUG][utils:991]: Resolve mpi cmd prefix: ['mpirun', '--allow-run-as-root', '-n', '2', '-host', 'gs-engine-pwkpkx-7r6wl:1,gs-engine-pwkpkx-hf5fq:1'] 2021-07-28 11:32:59,202 [DEBUG][utils:992]: Resolve mpi env: {'OMPI_MCA_btl_vader_single_copy_mechanism': 'none', 'OMPI_MCA_orte_allowed_exit_without_sync': '1', 'OMPI_MCA_odls_base_sigkill_timeout': '0', 'OMPI_MCA_plm_rsh_agent': '/usr/local/bin/kube_ssh'} 2021-07-28 11:32:59,375 [DEBUG][cluster:973]: Analytical engine launching command: mpirun --allow-run-as-root -n 2 -host gs-engine-pwkpkx-7r6wl:1,gs-engine-pwkpkx-hf5fq:1 grape_engine --host 0.0.0.0 --port 56507 -v 10 --vineyard_socket /tmp/vineyard_workspace/vineyard.sock 2021-07-28 11:32:59,566 [INFO][coordinator:1131]: Coordinator server listen at 0.0.0.0:59184 2021-07-28 03:33:07,926 [INFO][cluster:567]: Coordinator pod start successful with address xx.xx.xx.xx:32138, connecting to service ... 2021-07-28 03:33:13,068 [INFO][rpc:111]: GraphScope coordinator service connected. I0728 11:33:16.370306 96 grape_instance.cc:864] Registering Graph, graph type: 4, Type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521, lib path: /tmp/gs/builtin/e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521/libe33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521.so I0728 11:33:16.379833 88 grape_instance.cc:864] Registering Graph, graph type: 4, Type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521, lib path: /tmp/gs/builtin/e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521/libe33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521.so I0728 11:33:16.416115 96 grape_instance.cc:110] Loading graph, graph name: graph_a37JncCH, graph type: ArrowFragment, type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521 I0728 11:33:16.427814 88 grape_instance.cc:110] Loading graph, graph name: graph_a37JncCH, graph type: ArrowFragment, type sig: e33529e80839a2064a804ce453c761a9483aa7ab775bcfddc1a1f9da63dcb521 I0728 11:33:17.047847 88 property_graph_frame.cc:107] [worker-1] loaded graph to vineyard ... I0728 11:33:17.055609 96 property_graph_frame.cc:107] [worker-0] loaded graph to vineyard ...

Connecting to the PostgreSQL database... I0728 11:33:18.075819 88 property_graph_frame.cc:274] [worker-1] Add labels to graph and loaded to vineyard ... I0728 11:33:18.076035 96 property_graph_frame.cc:274] [worker-0] Add labels to graph and loaded to vineyard ... I0728 11:33:19.383785 88 property_graph_frame.cc:274] [worker-1] Add labels to graph and loaded to vineyard ... I0728 11:33:19.395239 96 property_graph_frame.cc:274] [worker-0] Add labels to graph and loaded to vineyard ... I0728 11:33:19.528188 88 gs_object.h:65] Object graph_yDsbzayy[LABELED_FRAGMENT_WRAPPER] is destructed. I0728 11:33:19.556005 96 gs_object.h:65] Object graph_yDsbzayy[LABELED_FRAGMENT_WRAPPER] is destructed. Traceback (most recent call last): File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/session.py", line 1135, in gremlin 2021-07-28 11:43:28,676 [ERROR][coordinator:606]: create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout engine_params=engine_params, File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/rpc.py", line 47, in with_grpc_catch 2021-07-28 11:43:28,676 [ERROR][coordinator:606]: create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout return fn(*args, **kwargs) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/rpc.py", line 170, in create_interactive_engine return check_grpc_response(response) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/framework/errors.py", line 181, in check_grpc_response raise error_type(status.error_msg, detail) graphscope.framework.errors.InteractiveEngineInternalError: 'create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout' The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "app.py", line 34, in interactive = sess.gremlin(g) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/utils.py", line 156, in wrapper return_value = func(*args, **kwargs) File "/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/graphscope/client/session.py", line 1140, in gremlin raise InteractiveEngineInternalError(str(e)) from e graphscope.framework.errors.InteractiveEngineInternalError: "'create interactive instance for object id 11144263247559523 failed with error code -1 message Check instance ready timeout'" 2021-07-28 11:43:29,418 [INFO][coordinator:635]: Coordinator close interactive instance with url[http://10.43.38.250:8080/instance/close?graphName=11144263247559523&podNameList=gs-engine-pwkpkx-7r6wl,gs-engine-pwkpkx-hf5fq&containerName=engine&waitingForDelete=False]

hi @346057177 seems the log this time the error is not , it connect to manager service successfully but check query failed.

346057177 commented 3 years ago

is there a discussion group ,eg. weixin group? can we join in to identify the problem?

acezen commented 3 years ago

is there a discussion group ,eg. weixin group? can we join in to identify the problem?

you can join the ding ding group 31533139

acezen commented 3 years ago

duplicate with #619