apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
946 stars 300 forks source link

[Subtask] support iceberg rest catalog in spark-connector #2716

Open caican00 opened 6 months ago

caican00 commented 6 months ago

Describe the subtask

support iceberg rest catalog in spark-connector.

  1. Now, the lakehouse-iceberg catalog is created without registering the rest service uri

image

  1. The spark-connector cannot get the rest service uri from the loaded catalog properties

  2. I think that we should support registering rest service uri when creating a lakehouse-iceberg catalog and i have a draft plan here:

image

We may not need to consider supporting rest catalog-backend in the early stage:

  1. iceberg rest service needs to support several current catalog-backend, so if rest catalog is used, rest service uri is required, regardless of which catalog-backend is chosen.
  2. rest catalog-backend may be the gravitino service. Currently, the gravitino service does not implement a lock similar to hms. So i think that the rest catalog-backend implementation can lower the priority.
  3. And we can start by supporting the registration of rest service uri into catalog properties.
  public static final ConfigEntry<Boolean> ENABLE_ICEBERG_REST_SERVICE =
          new ConfigBuilder(ENABLE_REST_SERVICE)
                  .doc("Weather to enable Iceberg rest service")
                  .version(ConfigConstants.VERSION_0_5_0)
                  .booleanConf()
                  .create();

    public static final ConfigEntry<String> ICEBERG_REST_SERVICE_URI =
          new ConfigBuilder(REST_SERVICE_URI)
                  .doc("The uri of Iceberg rest service")
                  .version(ConfigConstants.VERSION_0_5_0)
                  .stringConf()
                  .create();

Parent issue

https://github.com/datastrato/gravitino/issues/1571

caican00 commented 6 months ago

Hi @FANNG1 , i made a draft plan about how to support iceberg rest service in spark-connector, could you please help review it if you are free? Thank you very much.

caican00 commented 6 months ago

I made a draft plan about how to support iceberg rest service in spark-connector, please kindly review it if you are free. Thank you very much. cc @FANNG1

cc @coolderli

jerryshao commented 6 months ago

@FANNG1 Can you please take a look?

FANNG1 commented 6 months ago

@caican00 thanks for proposing this, totally I think Iceberg REST catalog is just one of catalogs of Iceberg, similar to HiveCatalog or JDBC catalog, so I prefer to add a new catalog backend called rest, with rest catalog server address as uri. for hive as rest catalog backend, we could provide something like backend-catalog.uri to distinguish the current uri, WDYT?

caican00 commented 5 months ago

@caican00 thanks for proposing this, totally I think Iceberg REST catalog is just one of catalogs of Iceberg, similar to HiveCatalog or JDBC catalog, so I prefer to add a new catalog backend called rest, with rest catalog server address as uri. for hive as rest catalog backend, we could provide something like backend-catalog.uri to distinguish the current uri, WDYT?

@FANNG1 i am sorry for taking so long to reply. I have some doubts about this plan:

  1. if add a new catalog backend called rest, in server side, should we instantiate a RestCatalog instance? image
  2. if we instantiate a RestCatalog instance in server side, how do we set up the real backend? such as hms. Because we should use the real backend catalog to interacts with backend storage in rest server.
FANNG1 commented 5 months ago
  1. if add a new catalog backend called rest, in server side, should we instantiate a RestCatalog instance?

yes, RestCatalog actually implements REST client.

  1. if we instantiate a RestCatalog instance in server side, how do we set up the real backend? such as hms. Because we should use the real backend catalog to interacts with backend storage in rest server.

I think this is the responsibility of the Iceberg REST catalog server, not the Gravitino Iceberg catalog.

caican00 commented 5 months ago
  1. if add a new catalog backend called rest, in server side, should we instantiate a RestCatalog instance?

yes, RestCatalog actually implements REST client.

  1. if we instantiate a RestCatalog instance in server side, how do we set up the real backend? such as hms. Because we should use the real backend catalog to interacts with backend storage in rest server.

I think this is the responsibility of the Iceberg REST catalog server, not the Gravitino Iceberg catalog.

I think it's okay.