alibaba / GraphScope

ðŸ”Ļ 🍇 ðŸ’ŧ 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | äļ€įŦ™åžå›ūčŪĄįŪ—įģŧįŧŸ
https://graphscope.io
Apache License 2.0
3.24k stars 439 forks source link

[BUG] error starting session on aws k8 #3037

Open rahulvramesh opened 1 year ago

rahulvramesh commented 1 year ago

Describe the bug

import graphscope
from graphscope.framework.loader import Loader
from graphscope.dataset.ogbn_mag import load_ogbn_mag

# Set up the GraphScope session
graphscope_session = graphscope.session(cluster_type='k8s', show_log=True)

# Load the dataset
g = load_ogbn_mag(graphscope_session)

# Process the graph
# Here we create a simple PageRank application as an example
pr = graphscope.pagerank(g, delta=0.85, max_round=10)
pr = pr.unfold()

trying to launch a session using above code but am getting

ConnectionError: Connect coordinator timeout, code: UNAVAILABLE, details: failed
to connect to all addresses; last error: UNKNOWN: ipv4:192.168.3.241:30243: 
Failed to connect to remote host: Operation timed out

Expected behavior launch and connect to session

Environment (please complete the following information):

welcome[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! And a maintainer will get back to you shortly! Please feel free to contact us on DingTalk, WeChat account(graphscope) or Slack. We are happy to answer your questions responsively.

siyuan0322 commented 1 year ago

Is there any more logs on the console?

rahulvramesh commented 1 year ago
/Users/user/miniconda3/envs/graphscope-k8/bin/python /Users/user/workspace/try/graphscope-k8/main.py 
['python3', '-m', 'gscoordinator', '--cluster_type', 'k8s', '--port', '59513', '--num_workers', '2', '--preemptive', 'True', '--instance_id', 'sydkbc', '--log_level', 'INFO', '--k8s_namespace', 'gs-gkcpcj', '--k8s_service_type', 'NodePort', '--k8s_image_repository', 'graphscope', '--k8s_image_pull_policy', 'IfNotPresent', '--k8s_coordinator_name', 'coordinator-sydkbc', '--k8s_coordinator_service_name', 'coordinator-sydkbc', '--k8s_vineyard_image', 'vineyardcloudnative/vineyardd:latest', '--k8s_vineyard_cpu', '0.5', '--k8s_vineyard_mem', '512Mi', '--vineyard_shared_mem', '4Gi', '--k8s_engine_cpu', '0.2', '--k8s_engine_mem', '1Gi', '--k8s_mars_worker_cpu', '0.2', '--k8s_mars_worker_mem', '4Mi', '--k8s_mars_scheduler_cpu', '0.2', '--k8s_mars_scheduler_mem', '2Mi', '--k8s_with_mars', 'False', '--k8s_enabled_engines', 'analytical,interactive,learning', '--k8s_with_dataset', 'False', '--timeout_seconds', '600', '--dangling_timeout_seconds', '600', '--waiting_for_delete', 'False', '--k8s_delete_namespace', 'True', '--k8s_image_registry', 'registry.cn-hongkong.aliyuncs.com', '--k8s_image_tag', '0.23.0', '--k8s_deploy_mode', 'eager']
╭───────────────────── Traceback (most recent call last) ──────────────────────â•Ū
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/rpc.py:68 in waiting_service_ready                       │
│                                                                              │
│    65 │   │   │   │   │   │   f"Start coordinator failed with exit code {cod │
│    66 │   │   │   │   │   )                                                  │
│    67 │   │   │   try:                                                       │
│ ❱  68 │   │   │   │   self._stub.HeartBeat(request)                          │
│    69 │   │   │   │   logger.info("GraphScope coordinator service connected. │
│    70 │   │   │   │   break                                                  │
│    71 │   │   │   except grpc.RpcError as e:                                 │
│                                                                              │
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/grpc/_channel.py:1030 in __call__                                          │
│                                                                              │
│   1027 │   │   │   │    compression: Optional[grpc.Compression] = None) -> A │
│   1028 │   │   state, call, = self._blocking(request, timeout, metadata, cre │
│   1029 │   │   │   │   │   │   │   │   │     wait_for_ready, compression)    │
│ ❱ 1030 │   │   return _end_unary_response_blocking(state, call, False, None) │
│   1031 │                                                                     │
│   1032 │   def with_call(                                                    │
│   1033 │   │   self,                                                         │
│                                                                              │
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/grpc/_channel.py:910 in _end_unary_response_blocking                       │
│                                                                              │
│    907 │   │   else:                                                         │
│    908 │   │   │   return state.response                                     │
│    909 │   else:                                                             │
│ ❱  910 │   │   raise _InactiveRpcError(state)  # pytype: disable=not-instant │
│    911                                                                       │
│    912                                                                       │
│    913 def _stream_unary_invocation_operations(                              │
╰──────────────────────────────────────────────────────────────────────────────â•Ŋ
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: 
ipv4:192.168.3.241:30243: Failed to connect to remote host: Operation timed out"
        debug_error_string = "UNKNOWN:failed to connect to all addresses; last 
error: UNKNOWN: ipv4:192.168.3.241:30243: Failed to connect to remote host: 
Operation timed out {created_time:"2023-07-20T11:01:29.43457+07:00", 
grpc_status:14}"
>

During handling of the above exception, another exception occurred:

╭───────────────────── Traceback (most recent call last) ──────────────────────â•Ū
│ /Users/user/workspace/try/graphscope-k8/main.py:7 in <module>        │
│                                                                              │
│     4 from graphscope.dataset.ogbn_mag import load_ogbn_mag                  │
│     5                                                                        │
│     6 # Set up the GraphScope session                                        │
│ ❱   7 graphscope_session = graphscope.session(cluster_type='k8s', show_log=T │
│     8                                                                        │
│     9 # Load the dataset                                                     │
│    10 g = load_ogbn_mag(graphscope_session)                                  │
│                                                                              │
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/utils.py:400 in wrapper                                  │
│                                                                              │
│   397 │   │   │   assert len(original_defaults) == len(new_defaults), "set d │
│   398 │   │   │   func.__defaults__ = tuple(new_defaults)                    │
│   399 │   │   │                                                              │
│ ❱ 400 │   │   │   return_value = func(*args, **kwargs)                       │
│   401 │   │   │                                                              │
│   402 │   │   │   # Restore original defaults.                               │
│   403 │   │   │   func.__defaults__ = original_defaults                      │
│                                                                              │
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/session.py:676 in __init__                               │
│                                                                              │
│    673 │   │   atexit.register(self.close)                                   │
│    674 │   │   # create and connect session                                  │
│    675 │   │   with CaptureKeyboardInterrupt(self.close):                    │
│ ❱  676 │   │   │   self._connect()                                           │
│    677 │   │                                                                 │
│    678 │   │   self._disconnected: bool = False                              │
│    679                                                                       │
│                                                                              │
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/session.py:1055 in _connect                              │
│                                                                              │
│   1052 │   │   self._grpc_client = GRPCClient(                               │
│   1053 │   │   │   self._launcher, self._coordinator_endpoint, self._config_ │
│   1054 │   │   )                                                             │
│ ❱ 1055 │   │   self._grpc_client.waiting_service_ready(                      │
│   1056 │   │   │   timeout_seconds=self._config_params["timeout_seconds"],   │
│   1057 │   │   )                                                             │
│   1058                                                                       │
│                                                                              │
│ /Users/user/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/rpc.py:78 in waiting_service_ready                       │
│                                                                              │
│    75 │   │   │   │   if e.code() == grpc.StatusCode.DEADLINE_EXCEEDED:      │
│    76 │   │   │   │   │   logger.warning("Heart beat analytical engine faile │
│    77 │   │   │   │   if time.time() - begin_time >= timeout_seconds:        │
│ ❱  78 │   │   │   │   │   raise ConnectionError(f"Connect coordinator timeou │
│    79 │   │   │   │   time.sleep(1)                                          │
│    80 │                                                                      │
│    81 │   def connect(self, cleanup_instance=True, dangling_timeout_seconds= │
╰──────────────────────────────────────────────────────────────────────────────â•Ŋ
ConnectionError: Connect coordinator timeout, code: UNAVAILABLE, details: failed
to connect to all addresses; last error: UNKNOWN: ipv4:192.168.3.241:30243: 
Failed to connect to remote host: Operation timed out
lidongze0629 commented 1 year ago

@rahulvramesh By default, the 'nodePort' service type will be used in k8s deployment

>>> graphscope_session = graphscope.session(cluster_type='k8s', show_log=True, k8s_service_type='NodePort')

So, please make sure your client node can access the k8s node.

$ kubectl describe node <k8s-node>
...
Addresses:
  InternalIP:  xxx.xxx.xxx.xxx
...

Or, you can try to use LoadBalancer service type.

>>> graphscope_session = graphscope.session(cluster_type='k8s', show_log=True, k8s_service_type='LoadBalancer.')
rahulvramesh commented 1 year ago

thanks @lidongze0629 i tried with LoadBalancer as well, its getting created but having same issue. let me give a try again. also, to reach the internal ip, i should be running the script on same network ? local execution won't work ?

rahulvramesh commented 1 year ago

with LoadBalancer


2023-07-20 07:31:52,706 [WARNING][op_executor:349]: Connecting to analytical engine... tried 1 time, will retry in 2 seconds
2023-07-20 07:31:52,707 [WARNING][op_executor:354]: Error code: StatusCode.UNAVAILABLE, details failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.22.227:56458: Failed to connect to remote host: Connection refused
2023-07-20 07:31:54,779 [WARNING][op_executor:349]: Connecting to analytical engine... tried 2 time, will retry in 4 seconds
2023-07-20 07:31:54,780 [WARNING][op_executor:354]: Error code: StatusCode.UNAVAILABLE, details failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.22.227:56458: Failed to connect to remote host: Connection refused
E0720 07:32:01.000000   531 /home/graphscope/GraphScope/analytical_engine/core/server/dispatcher.cc:153] Worker 0: VineyardError occurred on worker 0: VineyardError occurred on worker 0: /opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file '/Users'. Detail: [errno 2] No such file or directory
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}::operator()() const + 0x537
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&)::{lambda()#2}::operator()() const + 0x3B
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x265
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x50
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x37A
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1
2023-07-20 14:32:01,329 [ERROR][rpc:188]: Runstep failed with code: ANALYTICAL_ENGINE_INTERNAL_ERROR, message: Error occurred during RunStep, The traceback is: Traceback (most recent call last):
  File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 102, in run_step
    for response in responses:
  File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 475, in __next__
    return self._next()
  File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 864, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
    status = StatusCode.INTERNAL
    details = "VineyardError occurred on worker 0: VineyardError occurred on worker 0: /opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file '/Users'. Detail: [errno 2] No such file or directory
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}::operator()() const + 0x537
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&)::{lambda()#2}::operator()() const + 0x3B
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x265
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x50
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x37A
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x52
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B
AddLabelsToGraph + 0x21E
gs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x927
gs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1145
gs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA
gs::Dispatcher::publisherLoop() + 0x246
std::error_code::default_error_condition() const + 0x33
pthread_condattr_setpshared + 0x513
__xmknodat + 0x230

VineyardError occurred on worker 1: VineyardError occurred on worker 1: /opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file '/Users'. Detail: [errno 2] No such file or directory
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}::operator()() const + 0x537
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&)::{lambda()#2}::operator()() const + 0x3B
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x265
vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x50
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x37A
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x52
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B
AddLabelsToGraph + 0x21E
gs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x927
gs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1145
gs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA
gs::Dispatcher::subscriberLoop() + 0x7E
std::error_code::default_error_condition() const + 0x33
pthread_condattr_setpshared + 0x513
__xmknodat + 0x230

"
    debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.22.227:56458 {created_time:"2023-07-20T07:32:01.200169044+00:00", grpc_status:13, grpc_message:"VineyardError occurred on worker 0: VineyardError occurred on worker 0: /opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file \'/Users\'. Detail: [errno 2] No such file or directory\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}::operator()() const + 0x537\nvineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&)::{lambda()#2}::operator()() const + 0x3B\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x265\nvineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x50\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x37A\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x52\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B\nAddLabelsToGraph + 0x21E\ngs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x927\ngs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1145\ngs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA\ngs::Dispatcher::publisherLoop() + 0x246\nstd::error_code::default_error_condition() const + 0x33\npthread_condattr_setpshared + 0x513\n__xmknodat + 0x230\n\nVineyardError occurred on worker 1: VineyardError occurred on worker 1: /opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file \'/Users\'. Detail: [errno 2] No such file or directory\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}::operator()() const + 0x537\nvineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int)::{lambda()#3}&)::{lambda()#2}::operator()() const + 0x3B\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr<gs::detail::Vertex>, std::allocator<std::shared_ptr<gs::detail::Vertex> > > const&, int, int) + 0x265\nvineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x50\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x37A\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x52\ngs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B\nAddLabelsToGraph + 0x21E\ngs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x927\ngs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1145\ngs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA\ngs::Dispatcher::subscriberLoop() + 0x7E\nstd::error_code::default_error_condition() const + 0x33\npthread_condattr_setpshared + 0x513\n__xmknodat + 0x230\n\n"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/coordinator.py", line 311, in _RunStep
    head, bodies = self._operation_executor.run_on_analytical_engine(
  File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/monitor.py", line 191, in runOnAnalyticalEngineWarp
    res = func(instance, dag_def, dag_bodies, loader_op_bodies)
  File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 169, in run_on_analytical_engine
    response_head, response_bodies = self.run_step(dag_def, dag_bodies)
  File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 116, in run_step
    raise AnalyticalEngineInternalError(msg)
graphscope.framework.errors.AnalyticalEngineInternalError: VineyardError occurred on worker 0: VineyardError occurred on worker 0: /opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file '/Users'. Detail: [errno 2] No such file or di ... [truncated]

gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x52
gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B
AddLabelsToGraph + 0x21E
gs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x927
gs::GrapeInstance::OnReceive(std::shared_ptr<gs::CommandDetail>) + 0x1145
gs::Dispatcher::processCmd(std::shared_ptr<gs::CommandDetail>) + 0xEA
gs::Dispatcher::publisherLoop() + 0x246
std::error_code::default_error_condition() const + 0x33
pthread_condattr_setpshared + 0x513
__xmknodat + 0x230
╭───────────────────── Traceback (most recent call last) ──────────────────────â•Ū
│ /Users/rahulvramesh/workspace/try/graphscope-k8/main.py:10 in <module>       │
│                                                                              │
│     7 graphscope_session = graphscope.session(cluster_type='k8s', show_log=T │
│     8                                                                        │
│     9 # Load the dataset                                                     │
│ ❱  10 g = load_ogbn_mag(graphscope_session)                                  │
│    11                                                                        │
│    12 # Process the graph                                                    │
│    13 # Here we create a simple PageRank application as an example           │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/dataset/ogbn_mag.py:79 in load_ogbn_mag                         │
│                                                                              │
│    76 │                                                                      │
│    77 │   graph = sess.g()                                                   │
│    78 │   graph = (                                                          │
│ ❱  79 │   │   graph.add_vertices(os.path.join(prefix, "paper.csv"), "paper") │
│    80 │   │   .add_vertices(os.path.join(prefix, "author.csv"), "author")    │
│    81 │   │   .add_vertices(os.path.join(prefix, "institution.csv"), "instit │
│    82 │   │   .add_vertices(os.path.join(prefix, "field_of_study.csv"), "fie │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/framework/graph.py:1118 in add_vertices                         │
│                                                                              │
│   1115 │   def add_vertices(self, vertices, label="_", properties=None, vid_ │
│   1116 │   │   if not self.loaded():                                         │
│   1117 │   │   │   raise RuntimeError("The graph is not loaded")             │
│ ❱ 1118 │   │   return self._session._wrapper(                                │
│   1119 │   │   │   self._graph_node.add_vertices(vertices, label, properties │
│   1120 │   │   )                                                             │
│   1121                                                                       │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/session.py:950 in _wrapper                               │
│                                                                              │
│    947 │                                                                     │
│    948 │   def _wrapper(self, dag_node):                                     │
│    949 │   │   if self.eager():                                              │
│ ❱  950 │   │   │   return self.run(dag_node)                                 │
│    951 │   │   return dag_node                                               │
│    952 │                                                                     │
│    953 │   def run(self, fetches):                                           │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/session.py:986 in run                                    │
│                                                                              │
│    983 │   │   gc.collect()                                                  │
│    984 │   │                                                                 │
│    985 │   │   with self._lock:                                              │
│ ❱  986 │   │   │   return self.run_fetches(fetches)                          │
│    987 │                                                                     │
│    988 │   def run_fetches(self, fetches):                                   │
│    989 │   │   """Run operations of `fetches` without the session lock."""   │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/session.py:996 in run_fetches                            │
│                                                                              │
│    993 │   │   │   raise RuntimeError("Session disconnected.")               │
│    994 │   │   fetch_handler = _FetchHandler(self.dag, fetches)              │
│    995 │   │   try:                                                          │
│ ❱  996 │   │   │   response = self._grpc_client.run(fetch_handler.targets)   │
│    997 │   │   except FatalError:                                            │
│    998 │   │   │   self.close()                                              │
│    999 │   │   │   raise                                                     │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/rpc.py:98 in run                                         │
│                                                                              │
│    95 │   │   return str(self)                                               │
│    96 │                                                                      │
│    97 │   def run(self, dag_def):                                            │
│ ❱  98 │   │   return self._run_step_impl(dag_def)                            │
│    99 │                                                                      │
│   100 │   def fetch_logs(self):                                              │
│   101 │   │   if self._logs_fetching_thread is None:                         │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/utils.py:156 in with_grpc_catch                          │
│                                                                              │
│   153 │   │   retries = 0                                                    │
│   154 │   │   while True:                                                    │
│   155 │   │   │   try:                                                       │
│ ❱ 156 │   │   │   │   return fn(*args, **kwargs)                             │
│   157 │   │   │   except grpc.RpcError as exc:                               │
│   158 │   │   │   │   code = exc.code()                                      │
│   159 │   │   │   │   max_retries = GRPC_MAX_RETRIES_BY_CODE.get(code)       │
│                                                                              │
│ /Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-package │
│ s/graphscope/client/rpc.py:198 in _run_step_impl                             │
│                                                                              │
│   195 │   │   │   │   if isinstance(exc, tuple):                             │
│   196 │   │   │   │   │   raise exc[0](*exc[1:])                             │
│   197 │   │   │   │   else:                                                  │
│ ❱ 198 │   │   │   │   │   raise exc                                          │
│   199 │   │   return response                                                │
│   200 │                                                                      │
│   201 │   def create_analytical_instance(self):                              │
╰──────────────────────────────────────────────────────────────────────────────â•Ŋ
AnalyticalEngineInternalError: VineyardError occurred on worker 0: VineyardError
occurred on worker 0: 
/opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: 
operator() -> Arrow error: IOError: Failed to open local file '/Users'. Detail: 
[errno 2] No such file or di ... [truncated]
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/gs-jubdcn/services/coordinator-nujxgm
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/gs-jubdcn/services/coordinator-nujxgm
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(FileNotFoundError(2, 'No such file or directory'))': /api/v1/namespaces/gs-jubdcn/services/coordinator-nujxgm
Exception ignored in: <function KubernetesClusterLauncher.__del__ at 0x136e50dc0>
Traceback (most recent call last):
  File "/Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-packages/graphscope/deploy/kubernetes/cluster.py", line 149, in __del__
  File "/Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-packages/graphscope/deploy/kubernetes/cluster.py", line 566, in stop
  File "/Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/site-packages/graphscope/deploy/kubernetes/utils.py", line 394, in delete_kubernetes_object
  File "/Users/rahulvramesh/miniconda3/envs/graphscope-k8/lib/python3.9/re.py", line 210, in sub
ImportError: sys.meta_path is None, Python is likely shutting down
sighingnow commented 1 year ago

/opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file '/Users'. Detail:

Looks like you are using local files to load graphs. On Kubernetes, those data files must be available inside the pods.

lidongze0629 commented 1 year ago

@rahulvramesh

1) With 'nodePort': yes, to reach the internal IP, you should be running the script(client) on the same network, and local execution won't work.

2) With 'LoadBalancer', make sure you have the AWS Load Balancer Controller deployed on your cluster

3) From the error above, you need to mount a volume for data loading inside the pod

/opt/graphscope/include/graphscope/core/loader/arrow_fragment_loader.h:422: operator() -> Arrow error: IOError: Failed to open local file '/Users'.

Here are two ways:

1) Use the with_dataset parameter to mount our built-in datasets

with_dataset: Create a container and mount aliyun demo dataset bucket to the path /dataset.

>> sess = graphscope.session(cluster_type="k8s", with_dataset=True, k8s_service_type='LoadBalancer')
>> g = load_ogbn_mag(sess, '/dataset/ogbn_mag_small')
>> # project property graph 'g' to a simple graph to run pagerank
>> simple_g = g.project(vertices={"paper": []}, edges={"cites": []})
>> pr = graphscope.pagerank(simple_g, delta=0.85, max_round=10)
>> # print the result
>> pr.to_dataframe(selector={'node': 'r'})

2) Or you can use k8s_volumes parameter to mount a volume yourself, and load your own datasets