apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
914 stars 295 forks source link

[EPIC] Add Ray engine support for Gravitino #1355

Open xunliu opened 8 months ago

xunliu commented 8 months ago

Describe the proposal

This epic task tracks the work of adding Ray engine support for Gravitino.

Task list

Relate Issue

austin362667 commented 8 months ago

Hi Xun, I'm interested in this issue. May I take it over and get some heads-up for the initial design? Many thanks~

  1. Is there any reference implementation within this repo I should check first?
  2. Ray is a Distributed Computing Framework, what role will it play in Gravitino?
  3. Ray Engine you mean like Ray Data module or I've misunderstood anything?
  4. If we are talking about catalog level, is it rational, file, or stream?
xunliu commented 8 months ago

hi @austin362667 Thank you for your interest in this proposal.

At the moment we have a preliminary idea of

  1. Ray can be a good solution to the needs in AI training.
  2. Gravitino can manage the metadata of various training data for data scientists
  3. and then encapsulate a unified interface for reading and writing data
  4. of course, we are to provide Python libraries to support these capabilities

With the combination of these three capabilities, data scientists can be unified from data management, data reading and writing and training.

I think is a very cool idea, what do you think?

Also, we're planning to have a discussion next week with Ray community users and developers, you're welcome to join us.

austin362667 commented 8 months ago

Also, we're planning to have a discussion next week with Ray community users and developers, you're welcome to join us.

Sure! Feel excited to join. Thanks.