alibaba / GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统
https://graphscope.io
Apache License 2.0
3.31k stars 448 forks source link

[BUG] Can not load graph from S3 on k8s #4314

Open atberium opened 2 weeks ago

atberium commented 2 weeks ago

Describe the bug Have k8s cluster. When try to load graph from data storing in S3, get an error

To Reproduce Steps to reproduce the behavior:

  1. Setup and run k8s cluster
  2. Be sure, that the following python script runs properly and rises no error. Also be sure, that it starts all required GS pods on k8s cluster
    
    import graphscope
    from graphscope.framework.loader import Loader

session = graphscope.session() # depends on your setup, you could have some parameters set

session.close()

3. Be sure, that all S3 settings are correct and you can access files in bucket from every GS pods directly (using curl or s3cmd, etc)
4. Then, add the following code (replace `<placeholder>`):

graph = session.g() graph = graph.add_vertices(Loader('s3://bucket/vertices.csv', key='{{ s3_access_key }}', secret='{{ s3_secret_key }}', endpoint_url='{{ s3_endpoint_url }}', delimiter='|'), label='vertex') graph = graph.add_edges(Loader('s3://bucket/edges.csv', key='{{ s3_access_key }}', secret='{{ s3_secret_key }}', endpoint_url='{{ s3_endpoint_url }}', delimiter='|'), src_label='vertex', dst_label='vertex', label='knows')

5. See error:

E1106 17:01:31.000000 481 /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/arrow_fragment_loader.cc:432] Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"} E1106 17:01:31.000000 435 /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/arrow_fragment_loader.cc:432] Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"} E1106 17:01:31.000000 114 /home/graphscope/GraphScope/analytical_engine/core/server/dispatcher.cc:153] Worker 0: VineyardError occurred on worker 0: VineyardError occurred on worker 0: /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/fragment_loader_utils.cc:218: SyncSchema -> Assertion failed: field_num > 0: Empty table list cannot be used for normalizing schema vineyard::SyncSchema(std::shared_ptr const&, grape::CommSpec const&) + 0x7BC vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, int, int)::{lambda(std::shared_ptr const&)#2}&, std::shared_ptr const&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, int, int)::{lambda(std::shared_ptr const&)#2}&, std::shared_ptr const&)::{lambda()#2}::operator()() const + 0x49 gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, int, int) + 0x1845 vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables()::{lambda()#2}&)::{lambda()#2}::operator()() const + 0x52 gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexTables() + 0x35D gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::LoadVertexEdgeTables() + 0x2D1 gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragment(unsigned long) + 0x47 gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::AddLabelsToFragmentAsFragmentGroup(unsigned long) + 0x3B AddLabelsToGraph + 0x485 gs::GrapeInstance::addLabelsToGraph(gs::rpc::GSParams const&) + 0x83B gs::GrapeInstance::OnReceive(std::shared_ptr) + 0x1357 gs::Dispatcher::processCmd(std::shared_ptr) + 0xEA gs::Dispatcher::publisherLoop() + 0x246 std::error_code::default_error_condition() const + 0x33 pthread_condattr_setpshared + 0x513 2024-11-06 09:01:31,956 [ERROR][rpc:189]: Runstep failed with code: ANALYTICAL_ENGINE_INTERNAL_ERROR, message: Error occurred during RunStep, The traceback is: Traceback (most recent call last): File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 106, in run_step for response in responses: File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 543, in next return self._next() File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 969, in _next raise self grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL details = "VineyardError occurred on worker 0: VineyardError occurred on worker 0: /tmp/gs-local-deps/v6d-0.24.2/modules/graph/loader/fragment_loader_utils.cc:218: SyncSchema -> Assertion failed: field_num > 0: Empty table list cannot be used for normalizing schema vineyard::SyncSchema(std::shared_ptr const&, grape::CommSpec const&) + 0x7BC vineyard::sync_gs_error<gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, int, int)::{lambda(std::shared_ptr const&)#2}&, std::shared_ptr const&>(grape::CommSpec const&, gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, int, int)::{lambda(std::shared_ptr const&)#2}&, std::shared_ptr const&)::{lambda()#2}::operator()() const + 0x49 gs::ArrowFragmentLoader<long, unsigned long, vineyard::ArrowVertexMap>::loadVertexTables(std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, int, int) + 0x1845 ...

In short, error:
`Failed to read from stream o04c02d48a740008a: Object not exists: failed to get metadata for 'o04c02d48a740008a': failed to read get_data reply: {"content":null,"type":"get_data_reply"}`
**Expected behavior**
We expect a graph with vertices and edges loaded. Which we could check, using interactive, for example. And no errors.

**Environment:**
 - GraphScope version: v0.29.0
 - OS: Ubuntu
 - Version 24.04
 - Kubernetes Version 1.28.14
 - Python version: 3.11.10 (with following dependencies: graphscope==0.29.0, graphscope-client==0.29.0, pandas==2.0.3, aiohttp, async_timeout)

**Additional context**
We also tried to load the same data (vertices and edges) as file:

session = graphscope.session( k8s_volumes={ "data": { "type": "hostPath", "field": { "path": os.path.expanduser("~/examples/"), "type": "Directory" }, "mounts": { "mountPath": "/examples/" } } } )

graph = session.g() graph = graph.add_vertices(Loader('/examples/vertices.csv', delimiter='|'), label='vertex') graph = graph.add_edges(Loader('/examples/edges.csv', delimiter='|'), src_label='vertex', dst_label='vertex', label='knows')



And it works as expected, with no errors.
welcome[bot] commented 2 weeks ago

Thanks for opening your first issue here! Be sure to follow the issue template! And a maintainer will get back to you shortly! Please feel free to contact us on DingTalk, WeChat account(graphscope) or Slack. We are happy to answer your questions responsively.

siyuan0322 commented 2 weeks ago

Could you give a sample of first several lines of files on s3, so that I could give it a try by myself. Thank you

siyuan0322 commented 2 weeks ago

could you try to use client_kwargs instead of endpoint_url? just like this example:

d34 = Loader("s3://datafiles/group.e", key='access-id', secret='secret-access-key', client_kwargs={'region_name': 'us-east-1'})

ref: https://graphscope.io/docs/loading_graphs#loader-variants

atberium commented 2 weeks ago

Hi! Yes, please:

atberium commented 2 weeks ago

could you try to use client_kwargs instead of endpoint_url? just like this example:

d34 = Loader("s3://datafiles/group.e", key='access-id', secret='secret-access-key', client_kwargs={'region_name': 'us-east-1'})

ref: https://graphscope.io/docs/loading_graphs#loader-variants

We have custom host, so region is not suitable for us. using parameters this way: client_kwargs={'endpoint_url': '{{ s3_endpoint_url }}'} show the same error

siyuan0322 commented 2 weeks ago

Thanks for the input, I can reproduce it now.

atberium commented 2 weeks ago

@siyuan0322 , hi! Any update?

siyuan0322 commented 1 week ago

Found it may related to a change in upstream, still working on it with related guys.

dashanji commented 1 week ago

Hi @atberium. Thanks for your reporting, it's a bug introduced in the upstream. We have fixed it in the https://github.com/v6d-io/v6d/pull/2014.

But unfortunately, you have to wait for the upstream to be upgraded before you can use it.