Open osscm opened 6 months ago
It's included in the REST Catalog Spec proposal, via a handshake endpoint to provide catalog id.
Thanks @jbonofre Can you please point me in the Spec where it is specified and does REST catalog implementation like Tabular supports it?
That's the current proposal/discussion: https://docs.google.com/document/d/1JUtFpdEoa6IAKt1EzJi_re0PUbh56XnfUtRe5WAfl0s/edit?usp=sharing
REST Catalog is not yet a full "spec".
thanks @jbonofre !
cc @flyrain @RussellSpitzer @vgankidi
The multipart namespace in the REST spec can support a use case like catalog.db.table
.
@flyrain my understanding of the question is not on the namespace, but more at catalog level. Maybe I'm wrong :)
I also think catalog-name can be a separate entity.
IIUC, the catalog-name
is a concept introduced in Hive 3.0 to support one additional layer on top of database
. I will consider the multipart namespace in REST spec is a superset of HMS table identifier format. For example, the rest spec supports a namespace like a.b.c.d.table
, which is compatible with the HMS format catalog.db.table
. A new concept like catalog-name
isn't necessary in that sense.
thanks @flyrain
using namespace
to include catalog-name
can also work.
though then what will be the use of catalog-id
Though IMO, as Spark and Trino using hive's catalog-name
and understand it.
It would be great, if RESTCatalog can also support it, and otherwise anyways namespace
is there to support.
via a handshake endpoint to provide catalog id.
and we are also discussing the similar topic in Trino community as well.
and @danielcweeks has talked about iceberg.rest-catalog.warehouse
, so can we also use warehouse for the catalog-name
The name of the warehouse for the REST catalog will either be defined by the uri endpoint itself or can be provided by the iceberg.rest-catalog.warehouse property in the [REST catalog configuration](https://trino.io/docs/current/object-storage/metastores.html#rest-catalog). This is up to the backing REST catalog implementation.
https://github.com/trinodb/trino/issues/21813#issuecomment-2093743439
iceberg.rest-catalog.warehouse
normally points to a location like s3://my_bucket/warehouse_location
. Not sure Trino community is OK to use it as the catalog name. Even if that works, do we need any change for REST Spec? I think multipart namespace still work well. Trino can concatenate the catalog_name to the rest table identifier in the client side, e.g. catalog_name.db1.table1
.
@flyrain This is already spelled out in the REST Spec. The warehouse property is just a value that is passed to a catalog implementation and has no specific requirement that it is a path. This was typically a path with something like HiveCatalog implementation or HadoopCatalog implementation, since that was then forwarded on to the underlying implementation.
I think we are talking about the same thing here. I don't think there is a need to change the REST spec.
iceberg.rest-catalog.warehouse
normally points to a location likes3://my_bucket/warehouse_location
. Not sure Trino community is OK to use it as the catalog name. Even if that works, do we need any change for REST Spec? I think multipart namespace still work well. Trino can concatenate the catalog_name to the rest table identifier in the client side, e.g.catalog_name.db1.table1
.
Trino right now passes Trino's catalog name (not the HMS catalog name) like this:
which is added to the session context properties.
SessionCatalog.SessionContext(sessionId, session.getUser(), credentials, properties, session.getIdentity());
@flyrain do you know?
is it to pass the trinoCatalog
as header to the RESTCatalog service?
extracting header from the session context. https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L303
@osscm The information passed to the the REST server as part of the properties is contextual information and is only sent for user sessions. I wouldn't use this for resolving tables in any way.
Trino supports three part identifiers, the first of which is the catalog name from Trino's perspective. I feel like the best way to approach this is to use the warehouse property to identify what HMS catalog name is being addressed and just name the trino catalog and the hms catalog consistently.
The way this was handled prior to HMS supporting catalog names was to just use different URIs with different HMS servers. Warehouse allows you to differentiate the catalog being addressed and decoupling it from Trino configuration.
I don't feel like there's anything that should change either in Trino or the REST spec since there is a path to support this.
I agreed with @danielcweeks. The Trino client can either pass the catalog name as a part of url(e.g. , $ENDPOINT/v1/prefix/namespaces/hms_catalog%1Fdb_name/tables/tablename
), or pass the catalog name as an extra header, which we can sync-up internally.
I feel like the best way to approach this is to use the warehouse property to identify what HMS catalog name is being addressed
It is so confusing (feels like a hacky solution) to use warehouse
keyword to identify the catalog name. Warehouse has a different meaning for catalogs, using it as an identifier makes things confusing. Why can't we have a new property catalog-name
?
For the context, the handshake
endpoint of the "new" REST proposal can provide the catalog name, that's the kind of "configuration exchange". I agree that warehouse
is confusing and not obvious to be used for catalog name 😄
@jbonofre: I don't see much progress in terms of "new proposal" (no response from AWS guys), what are the next plans on that?
@ajantha-bhat i was waiting new review from AWS guys. I will move forward by creating a branch with jaxrs/open api annotated interfaces to illustrate the new proposal.
Warehouse has a different meaning for catalogs . . .
There is no agreed upon standard for the naming of parts of an identifier, so while some people will find confusing, others will not. The term warehouse
was taken from hive prior to their addition of a catalog concept. We can explore supporting alternatives, but I don't feel like making a change is necessary to address the issue proposed.
I agree with @danielcweeks here with the current REST API. I think it makes sense to include in the new REST Spec, but no need to be addressed with the current one.
We can explore supporting alternatives, but I don't feel like making a change is necessary to address the issue proposed.
Agree.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Query engine
Trino
Question
We are thinking it would be good feature to support custom catalog name in REST Catalog. Before creating an issue, thought will discuss here, as we might be missing something.
Hive metastore and Glue supports catalog name, which can be used as namespace hierarchy like.
<catalog>.<database/schema>.table
Glue catalog's property name is
hive.metastore.glue.catalogid
and for Hive metastore we have started working on adding that support. issue: https://github.com/trinodb/trino/issues/10287 changes: https://github.com/trinodb/trino/pull/21502We are internally using custom-catalog with Metastore from time, and working with OSS community to share with the upstream.
Iceberg catalog supports custom namespace, but there is no concept of the
catalog-name
. I think if we support that as well, it will help to engines to use the existing model and less changes when adopting REST Catalog.I might be missing something, so please help me to understand it.