datastrato / gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
401 stars 166 forks source link

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

Open jerryshao opened 2 weeks ago

jerryshao commented 2 weeks ago

Describe the feature

Fileset is a new concept brought in 0.5.0 to manage the non-tabular data, the current implementation uses HCFS to manage the physical data. With HCFS, the Hadoop catalog should support different underlying storage, but currently we only verified local file system and HDFS.

In this issue, we should also support S3, to make the fileset hadoop catalog work with S3 object store.

Motivation

The reason to support S3 is that it is vastly used on the public cloud, we should add this support anyway.

Describe the solution

No response

Additional context

No response

zhoukangcn commented 2 weeks ago

I think we can change this feature to Support Object Store provided by Cloud Service, so we can add subtask to support Azure Blob and Aliyun OSS