datastrato / gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
401 stars 166 forks source link

[#1838] feat(client): Add Java client support for fileset #2039

Closed coolderli closed 3 months ago

coolderli commented 3 months ago

What changes were proposed in this pull request?

Add Java client support for fileset

Why are the changes needed?

Fix: #1838

Does this PR introduce any user-facing change?

How was this patch tested?

add ut. com.datastrato.gravitino.client.TestFilesetCatalog

coolderli commented 3 months ago

@jerryshao @yuqi1129 @mchades The pipeline had passed. Please take a review when you have time. Thanks.

YxAc commented 3 months ago

1)Whether all catalogs need schema

2)About Naming

@jerryshao @coolderli @qqqttt123

qqqttt123 commented 3 months ago

1)Whether all catalogs need schema

  • I prefer YES, because schema can make it easy to manage (schema equivalent to database), especially files

2)About Naming

  • If all catalogs need schema, the 'SupportsSchemas' can be a base class, doesn't have to be an interface? such as naming 'SchemaCatalog'
  • 'GravitinoFilesetCatalog', i prefer naming 'UnstructuredCatalog'

@jerryshao @coolderli @qqqttt123

Is UnstructuredCatalog a good name if we have model catalogs?

YxAc commented 3 months ago

1)Whether all catalogs need schema

  • I prefer YES, because schema can make it easy to manage (schema equivalent to database), especially files

2)About Naming

  • If all catalogs need schema, the 'SupportsSchemas' can be a base class, doesn't have to be an interface? such as naming 'SchemaCatalog'
  • 'GravitinoFilesetCatalog', i prefer naming 'UnstructuredCatalog'

@jerryshao @coolderli @qqqttt123

Is UnstructuredCatalog a good name if we have model catalogs?

i think model catalog can extends unstructured catalog, we can discuss this question on DS-Xiaomi regular meeting tomorrow?

coolderli commented 3 months ago

@mchades I found the pipeline is crashed for Javadoc. But I can't figure out the reason. Can you help me about it?

https://github.com/datastrato/gravitino/actions/runs/7783974587/job/21223570429?pr=2039#step:5:543

yuqi1129 commented 3 months ago

@mchades I found the pipeline is crashed for Javadoc. But I can't figure out the reason. Can you help me about it?

datastrato/gravitino/actions/runs/7783974587/job/21223570429?pr=2039#step:5:543

You need to address all warning informations when building java doc for module api, common and client-java as they are directly exposed to users.

qqqttt123 commented 3 months ago

Error logs in the pipeline

/home/runner/work/gravitino/gravitino/clients/client-java/src/main/java/com/datastrato/gravitino/client/GravitinoFilesetCatalog.java:223: warning: no @return

  public static Builder builder() {
                        ^
> Task :clients:client-java:javadoc
/home/runner/work/gravitino/gravitino/clients/client-java/src/main/java/com/datastrato/gravitino/client/RelationalCatalog.java:263: warning: no @return
  public static Builder builder() {
                        ^
coolderli commented 3 months ago

@jerryshao @yuqi1129 @qqqttt123 @mchades I have fixed the pipeline. Please review it again when you have time. Thanks.

coolderli commented 3 months ago

@jerryshao @qqqttt123 @mchades @yuqi1129 Could you review it again when you have time? Thanks.

qqqttt123 commented 3 months ago

LGTM.

qqqttt123 commented 3 months ago

@jerryshao @yuqi1129 @mchades Could you take an another look?

jerryshao commented 3 months ago

Also can please update the branch to rebase to the latest main.

jerryshao commented 3 months ago

LGTM. Thanks @coolderli for your contribution.