Open coolderli opened 3 days ago
@jerryshao @shaofengshi @caican00 @xloya @lw-yang What do you think? Any other thoughts about this?
I think it is a good idea to manage Schema as a resource at the same level as Table/Fileset/Messaging. In this way, we can distinguish between Managed Schema (data type is based on Gravitino) and External Schema (data type is based on the existing external Schema Registry or other systems). Then, in resources that require a specific Schema (such as some Filesets), we can bind a Schema to it. When obtaining Fileset metadata, we will also obtain the corresponding Schema and use it in some clients.
It's a bit strange that "schema" is an entity. Theoretically, the entity maps a data object, whereas "schema" is binding to the entity. We should think more about how to support this scenario.
Describe the feature
In Kafka and Fileset, we may need a table schema to deserialize data. We can manage the external schema registry in Gravitino.
Motivation
Describe the solution
We can introduce a TableSchemaCatalog to manage the TableSchema.
We can bind a table schema such as
catalog.schema.table-schema
to a topic or fileset when needed. So we can get the table schema from the external schema registry. We can also add a schema registry managed by gravitino, so we can directory save the table schema to the gravitino metastore.Additional context
No response