Open theoryxu opened 1 month ago
As far as I know, JuiceFS community version provides Hadoop SDK, please refer to: https://juicefs.com/docs/zh/community/hadoop_java_sdk. So I think JuiceFS can be supported on Fileset using Hadoop SDK like S3.
as @xloya mentions, JuiceFS is compatible with HDFS API via its Java SDK and also supports S3 API (ref). But I highly recommend Gravitino support POSIX for all the generic file systems, including JuiceFS, Lustre, CephFS, and more.
@Suave Is Hadoop SDK a better choice for big data scenarios?
@theoryxu thanks for creating this issue. What I'm curious is, what are the pain points or challenges when using Juicefs, and using Gravitino Fileset can overcome or solve? If you can share some of them, that will be good for others to understand this feature. Thank you!
@Suave Is Hadoop SDK a better choice for big data scenarios?
Yes, JuiceFS Java SDK works better in Hadoop ecosystem, it's compatible with Hadoop 2.x and 3.x both
Describe the feature
Fileset is a new concept introduced in 0.5.0 to manage non-tabular data; the current implementation uses HCFS to manage physical data. Now, HCFS doesn't support JuiceFS.
In this issue, we should discuss: how to support JuiceFS in Fileset and how to achieve it.
Motivation
JuiceFS is a high-performance, cloud-native distributed file system that is developing rapidly. Support of this could help Gravitino to be used in more scenarios in the future
Describe the solution
No response
Additional context
No response