[FEATURE] Add S3 support for Fileset Hadoop catalog

datastrato / gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.

Apache License 2.0

401 stars 166 forks source link

Describe the feature

Fileset is a new concept brought in 0.5.0 to manage the non-tabular data, the current implementation uses HCFS to manage the physical data. With HCFS, the Hadoop catalog should support different underlying storage, but currently we only verified local file system and HDFS.

In this issue, we should also support S3, to make the fileset hadoop catalog work with S3 object store.

Motivation

The reason to support S3 is that it is vastly used on the public cloud, we should add this support anyway.

Describe the solution

No response

Additional context

No response

datastrato / gravitino